Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The technique of reproducing the author's accuracy #40

Open
yeyuxmf opened this issue Jan 16, 2020 · 4 comments
Open

The technique of reproducing the author's accuracy #40

yeyuxmf opened this issue Jan 16, 2020 · 4 comments

Comments

@yeyuxmf
Copy link

yeyuxmf commented Jan 16, 2020

Although this is not the first time for me to hang out with the author, I would like to thank the author again for the code.
I've almost reproduced the accuracy of the open source code here.
Accuracy of single scale test: 76.17%
Multi scale accuracy test: 77.90%
It's very easy to get this precision. I downloaded the code directly here, and then the training environment is similar to the one mentioned in the author's code. That is, I can run the code directly without any modification.
Be sure to remember to train directly without any modification.
To add up, the capacity of my GPU graphics card is still a little small. Each GPU's batch_size is 6, and the sum of the two graphics cards is 12.So it's normal that the accuracy here is a little bit poor.

@yeyuxmf
Copy link
Author

yeyuxmf commented Jan 16, 2020

In addition, before getting the precision here, I used my own environment, so the code was modified as follows:
pytorch1.2.0
syncbatchnorm----->nn.batchnorm
batch_size =8
gpu_nums =1,Only one GPU
train_data_szie =640*480
I worked for half a month. No matter how I train, the highest accuracy is only 70.0%.
Therefore, I would like to remind you to deploy an environment similar to that of the author as much as possible to ensure that the downloaded code can be trained directly without any modification.
After getting the author's training accuracy, then gradually change each influencing factor, get the accuracy mentioned by the author, and then slowly change towards their own environment. Only in this way can we know which factors cause the accuracy can not be reproduced.

@CoinCheung
Copy link
Owner

Thanks for verifying!! I am happy that you can train your model well.

@yeyuxmf
Copy link
Author

yeyuxmf commented Feb 11, 2020

pytorch1.2.0
ubuntu18.04
cuda9.0
batch_size =6
gpu_nums =1,Only one GPU
train_data_szie =1024*1024
max_iter=160000
I feel that the author's code portability is very good. I use both pytorch 1.0 and pytorch 1.2. I think there should be no problem with later versions of pytorch. I am currently testing the impact of different factors on the model. Post the results to share your results.
it: 159500/160000, lr: 0.000056, loss: 2.5404, eta: 0:04:28, time: 26.4423
it: 159550/160000, lr: 0.000051, loss: 2.5879, eta: 0:04:01, time: 26.5868
it: 159600/160000, lr: 0.000046, loss: 2.5492, eta: 0:03:34, time: 26.6619
it: 159650/160000, lr: 0.000041, loss: 2.5945, eta: 0:03:07, time: 26.8894
it: 159700/160000, lr: 0.000035, loss: 2.5278, eta: 0:02:41, time: 26.4492
it: 159750/160000, lr: 0.000030, loss: 2.5143, eta: 0:02:14, time: 26.6291
it: 159800/160000, lr: 0.000025, loss: 2.5752, eta: 0:01:47, time: 26.5566
it: 159850/160000, lr: 0.000019, loss: 2.5299, eta: 0:01:20, time: 26.5901
it: 159900/160000, lr: 0.000013, loss: 2.5764, eta: 0:00:54, time: 28.0687
it: 159950/160000, lr: 0.000007, loss: 2.5445, eta: 0:00:27, time: 26.5950
it: 160000/160000, lr: 0.000000, loss: 2.5528, eta: 0:00:00, time: 26.7321
training done, model saved to: ./res/model_final.pth

evaluating the model ...
setup and restore model
compute the mIOU
100%|█████████████████████████████████████████| 250/250 [17:49<00:00, 4.28s/it]
mIOU is: 0.780227
mIOU = 0.7802269045024711
(pytorch)BiSeNet_syncbn$ CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --npr
oc_per_node=1 train.py
100%|█████████████████████████████████████████| 250/250 [03:28<00:00, 1.20it/s]
mIOU = 0.7599654703378823

@CuttlefishXuan
Copy link

Hi, how long did your training process take with batch_size=6? It appears to more than 2 days with batch size=16, and my gpu_nums=4 (2080Ti). Is it normal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants