Disappeared training epoch and loss output #2275
Unanswered
AnatoleWang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Normally when the C3D is trained, the output should be as follows:
2023-02-09 18:18:30,215 - mmaction - INFO - workflow: [('train', 1)], max: 45 epochs
2023-02-09 18:19:20,858 - mmaction - INFO - Epoch [1][20/20] lr: 1.000e-03, eta: 0:37:08, time: 2.532, data_time: 1.908, memory: 5750, top1_acc: 0.6038, top5_acc: 0.7679, loss_cls: 1.8912, loss: 1.8912, grad_norm: 32.0169
2023-02-09 18:20:05,393 - mmaction - INFO - Epoch [2][20/20] lr: 1.000e-03, eta: 0:33:53, time: 2.198, data_time: 1.860, memory: 5750, top1_acc: 0.8935, top5_acc: 1.0000, loss_cls: 0.2705, loss: 0.2705, grad_norm: 19.0957
2023-02-09 18:20:49,067 - mmaction - INFO - Epoch [3][20/20] lr: 1.000e-03, eta: 0:32:06, time: 2.153, data_time: 1.819, memory: 5750, top1_acc: 0.9773, top5_acc: 1.0000, loss_cls: 0.0732, loss: 0.0732, grad_norm: 8.1325
2023-02-09 18:21:31,662 - mmaction - INFO - Epoch [4][20/20] lr: 1.000e-03, eta: 0:30:41, time: 2.099, data_time: 1.766, memory: 5750, top1_acc: 0.9616, top5_acc: 1.0000, loss_cls: 0.1309, loss: 0.1309, grad_norm: 10.0417
2023-02-09 18:22:14,109 - mmaction - INFO - Epoch [5][20/20] lr: 1.000e-03, eta: 0:29:31, time: 2.093, data_time: 1.761, memory: 5750, top1_acc: 0.9913, top5_acc: 1.0000, loss_cls: 0.0296, loss: 0.0296, grad_norm: 3.8235
2023-02-09 18:22:14,723 - mmaction - INFO - Saving checkpoint at 5 epochs
2023-02-09 18:22:49,564 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-02-09 18:22:49,573 - mmaction - INFO -
top1_acc 0.9469
top5_acc 1.0000
2023-02-09 18:22:49,574 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-02-09 18:22:49,577 - mmaction - INFO -
mean_acc 0.9517
2023-02-09 18:22:52,115 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.
2023-02-09 18:22:52,116 - mmaction - INFO - Best top1_acc is 0.9469 at 5 epoch.
2023-02-09 18:22:52,117 - mmaction - INFO - Epoch(val) [5][8] top1_acc: 0.9469, top5_acc: 1.0000, mean_class_accuracy: 0.9517
However, when I change some parameters of the model, the output doesn't contain the Epoch(train) and loss:
2023-03-07 20:52:57,611 - mmaction - INFO - workflow: [('train', 1)], max: 45 epochs
2023-03-07 20:55:50,501 - mmaction - INFO - Saving checkpoint at 5 epochs
2023-03-07 20:56:08,759 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-03-07 20:56:08,761 - mmaction - INFO -
top1_acc 0.7361
top5_acc 0.9583
2023-03-07 20:56:08,762 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-03-07 20:56:08,764 - mmaction - INFO -
mean_acc 0.7286
2023-03-07 20:56:11,720 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.
2023-03-07 20:56:11,722 - mmaction - INFO - Best top1_acc is 0.7361 at 5 epoch.
2023-03-07 20:56:11,722 - mmaction - INFO - Epoch(val) [5][3] top1_acc: 0.7361, top5_acc: 0.9583, mean_class_accuracy: 0.7286
When I change workflow = [('train', 1),('val',1)] in c3d_sports1m_16x1x1_45e_ucf101_rgb.py, and the command:
python tools/train.py configs/recognition/c3d/c3d_sports1m_16x1x1_45e_ucf101_rgb.py --validate --gpus 1 --seed 0 --deterministic --cfg-options load_from=checkpoints/c3d_sports1m_pretrain_20201016-dcc47ddc.pth
The loss shows up again, but not the Epoch(train):
2023-03-08 16:06:12,981 - mmaction - INFO - workflow: [('train', 1), ('val', 1)], max: 45 epochs
2023-03-08 16:07:01,192 - mmaction - INFO - Epoch(val) [1][3] top1_acc: 0.1528, top5_acc: 0.4861, loss_cls: 4.1278, loss: 4.1278
2023-03-08 16:07:47,966 - mmaction - INFO - Epoch(val) [2][3] top1_acc: 0.4583, top5_acc: 0.8750, loss_cls: 3.0645, loss: 3.0645
2023-03-08 16:08:32,920 - mmaction - INFO - Epoch(val) [3][3] top1_acc: 0.6111, top5_acc: 0.9306, loss_cls: 1.7323, loss: 1.7323
2023-03-08 16:09:19,538 - mmaction - INFO - Epoch(val) [4][3] top1_acc: 0.7083, top5_acc: 0.9444, loss_cls: 1.0702, loss: 1.0702
2023-03-08 16:09:48,785 - mmaction - INFO - Saving checkpoint at 5 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 72/72, 5.8 task/s, elapsed: 12s, ETA: 0s2023-03-08 16:10:04,643 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-03-08 16:10:04,646 - mmaction - INFO -
top1_acc 0.7639
top5_acc 0.9444
2023-03-08 16:10:04,648 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-03-08 16:10:04,650 - mmaction - INFO -
mean_acc 0.7571
2023-03-08 16:10:07,840 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.
2023-03-08 16:10:07,841 - mmaction - INFO - Best top1_acc is 0.7639 at 5 epoch.
2023-03-08 16:10:07,841 - mmaction - INFO - Epoch(val) [5][3] top1_acc: 0.7639, top5_acc: 0.9444, mean_class_accuracy: 0.7571
Does anyone know where the real problem is? Maybe the --validate or other things? Thanks a lot for your reply!
Beta Was this translation helpful? Give feedback.
All reactions