ResNet-50: Clarification on constrains on LR schedules #414

mrinaliyer · 2021-01-12T15:52:21Z

The RN50 rules are not clear about the following:

Previous rules allowed stepped LR. Are they still permitted?
Is cosine LR permitted?
There is an end_learning_rate of 1e-4 as a constraint. Shouldnt this depend on batch size? Is there a reason that end-LR is fixed for SGD-M. Shouldnt it be a fraction of the starting LR?
Is there a document anywhere with a comprehensive list of changes in RN50 rules from earlier submissions.

Provide feedback