You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some questions about adaptive layers when training KD.
When you combined your KD method with other intermediate feature map KD methods, you had to use adaptive layers to upscale student feature maps. I wonder if these adaptive layers were trained with students, or if you just froze them? I've read a lot of papers and nothing written about this.
These adaptive layers may sometimes distort the output feature map from student and also, they don't contribute to the inference process of student. So why do adaptive layers make KD training work effectively? I think they would make the mAP decrease.
Can you explain to me, please? Thank you very much.
The text was updated successfully, but these errors were encountered:
Adaptive layer is used when student feature map and teacher feature map doesn't match.
Many KD papers use FPN as learning target, and FPN layer mostly have the same feature map, thus no adaptive layer (Including ours). That's why we don't mention it
Oh, I see. In my work, I have to use adaptive layers because the number of channels between student and teacher doesn't equal, and I think that makes the mAP of student drop slightly.
I have some questions about adaptive layers when training KD.
Can you explain to me, please? Thank you very much.
The text was updated successfully, but these errors were encountered: