You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the following script, CUDA_VISIBLE_DEVICES=0 python train.py --cfg configs/cuhk_sysu.yaml INPUT.BATCH_SIZE_TRAIN 2 SOLVER.BASE_LR 0.0012 SOLVER.MAX_EPOCHS 20 SOLVER.LR_DECAY_MILESTONES [11] MODEL.LOSS.USE_SOFTMAX True SOLVER.LW_RCNN_SOFTMAX_2ND 0.1 SOLVER.LW_RCNN_SOFTMAX_3RD 0.1 OUTPUT_DIR ./logs/cuhk-sysu
then there is an error "Loss is nan, stopping training". How we can do to solve this problem? Thk.
When I run the following script, CUDA_VISIBLE_DEVICES=0 python train.py --cfg configs/cuhk_sysu.yaml INPUT.BATCH_SIZE_TRAIN 2 SOLVER.BASE_LR 0.0012 SOLVER.MAX_EPOCHS 20 SOLVER.LR_DECAY_MILESTONES [11] MODEL.LOSS.USE_SOFTMAX True SOLVER.LW_RCNN_SOFTMAX_2ND 0.1 SOLVER.LW_RCNN_SOFTMAX_3RD 0.1 OUTPUT_DIR ./logs/cuhk-sysu
then there is an error "Loss is nan, stopping training". How we can do to solve this problem? Thk.
Start training...
Epoch: [0] [ 0/5603] eta: 11:33:02 lr: 0.000001 loss: 15.3648 (15.3648) loss_rcnn_cls_1st: 0.7488 (0.7488) loss_rcnn_reg_1st: 1.0251 (1.0251) loss_rcnn_cls_2nd: 0.8071 (0.8071) loss_rcnn_reg_2nd: 0.1901 (0.1901) loss_rcnn_cls_3rd: 0.8165 (0.8165) loss_rcnn_reg_3rd: 0.0001 (0.0001) loss_rcnn_reid_2nd: 4.6311 (4.6311) loss_rcnn_reid_3rd: 4.6311 (4.6311) loss_rpn_reg: 0.1002 (0.1002) loss_rpn_cls: 0.6908 (0.6908) loss_box_softmax_2nd: 0.8609 (0.8609) loss_box_softmax_3rd: 0.8630 (0.8630) time: 7.4214 data: 6.1422 max mem: 12317
Loss is nan, stopping training
{'loss_rcnn_cls_1st': tensor(0.6996, device='cuda:0', grad_fn=), 'loss_rcnn_reg_1st': tensor(0.9864, device='cuda:0', grad_fn=), 'loss_rcnn_cls_2nd': tensor(0.8071, device='cuda:0', grad_fn=), 'loss_rcnn_reg_2nd': tensor(0.1589, device='cuda:0', grad_fn=), 'loss_rcnn_cls_3rd': tensor(0.8237, device='cuda:0', grad_fn=), 'loss_rcnn_reg_3rd': tensor(3.1079e-05, device='cuda:0', grad_fn=), 'loss_rcnn_reid_2nd': tensor(nan, device='cuda:0', grad_fn=), 'loss_rcnn_reid_3rd': tensor(nan, device='cuda:0', grad_fn=), 'loss_rpn_reg': tensor(0.0262, device='cuda:0', grad_fn=), 'loss_rpn_cls': tensor(0.6909, device='cuda:0', grad_fn=), 'loss_box_softmax_2nd': tensor(nan, device='cuda:0', grad_fn=), 'loss_box_softmax_3rd': tensor(nan, device='cuda:0', grad_fn=)}
Process finished with exit code 1
The text was updated successfully, but these errors were encountered: