You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss?
#267
Open
jinghere11 opened this issue
Dec 12, 2023
· 1 comment
total_loss = 0
n_loss_terms = 0
for iq, q in enumerate(teacher_out):
for v in range(len(student_out)):
if v == iq:
# we skip cases where student and teacher operate on the same view
continue
loss = torch.sum(-q * F.log_softmax(student_out[v], dim=-1), dim=-1)
total_loss += loss.mean()
n_loss_terms += 1
The text was updated successfully, but these errors were encountered:
I am not an expert, but my intuition is that feeding the same image will lead to a very small loss and hence an insginificat training, so is wasted resources
The text was updated successfully, but these errors were encountered: