BCE

以Pytorch的BCEloss官方示例为例子。

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, 2, requires_grad=True)
target = torch.rand(3, 2, requires_grad=False)
output = loss(m(input), target)
output.backward()
--------------------
input = tensor([[ 0.1031, -0.9440],
        [-0.5792, -0.3374],
        [ 1.4538,  0.1611]], requires_grad=True)
target = tensor([[0.6385, 0.2900],
        [0.0204, 0.2542],
        [0.9668, 0.8947]])
m(input) = tensor([[0.5258, 0.2801],
        [0.3591, 0.4164],
        [0.8106, 0.5402]], grad_fn=<SigmoidBackward0>)
output = tensor(0.5425, grad_fn=<BinaryCrossEntropyBackward0>)

首先计算input的结果，使用sigmoid得到输出。
target应该与input同shape，且值在0到1之间。
计算BCEloss。

再看BCEWithLogitsLoss的例子。

loss = nn.BCEWithLogitsLoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(input, target)
output.backward()

还可以用BCELoss来实现相同功能，只是需要手动进行sigmoid操作。

# 模型输出
input = torch.randn(3, requires_grad=True)

# 对模型输出应用 sigmoid 函数，转换为概率值
sigmoid_input = torch.sigmoid(input)

# 目标值
target = torch.empty(3).random_(2)

# 使用 nn.BCELoss 计算损失
loss = nn.BCELoss()
output = loss(sigmoid_input, target)

这里的target就可以直接使用label。

总结，BCE有两种用法，一种是Example of target with class probabilities，将一个样本的每个label的logits算出，就是前面的BCELoss，每行有两个值，代表对应类别的logits，target也需要是同样的shape。

第二种就是Example of target with class indices，具体实现是BCEWithLogitsLoss，不需要手动进行sigmoid操作，直接将input代表的logits值与代表label的target做运算。

CE

nn.CrossEntropyLoss实际上就是nn.LogSoftmax + nn.NLLLoss。

input是logits值。
softmax(input)。
log(softmax(input))。

# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()

可以用nn.LogSoftmax + nn.NLLLoss来实现交叉熵损失，可以看到最终的输出是能对应上的

loss = nn.CrossEntropyLoss()
input = torch.randn(3,5)
target = torch.empty(3,dtype=torch.long).random_(5) #tensor([0, 3, 2])
loss(input,target) # tensor(1.8343)
---
softmax = nn.Softmax(dim=1)
torch.log(softmax(input)) 
---
tensor([[-2.1570, -0.7428, -3.1800, -1.9789, -1.4751],
        [-2.2395, -3.3203, -1.2979, -0.5637, -4.1892],
        [-2.5196, -1.1165, -2.7820, -1.1686, -1.5171]])
(-2.1570 -0.5637-2.7820)/3 = -1.8342333333333334