You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is no definitive answer to that question, as this depends on the model and dataset. A good start would be the learning rate the authors use in the original paper of the model. They're probably using a much bigger batch size than 4, so I would advise you to lower it.
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1.e-8
weight_decay: 0.05
no_weight_decay_name: norm
one_dim_param_no_weight_decay: True
lr:
name: Cosine
learning_rate: 0.001 # 8gpus 192bs
warmup_epoch: 5
The text was updated successfully, but these errors were encountered: