-
Notifications
You must be signed in to change notification settings - Fork 240
time_model.py gives different results to those in model_zoo #79
Comments
Hi Dan, thanks for raising the issue. As a first step, I think it would be good to ensure that we are using the same version of the code and settings for the precise timing. Could you please try running the following command on the latest master and let us know what you observe? Command:
We double-checked that using this command we get times very close to the model zoo (35ms). So if this does not resolve the issue, we should probably check the software versions. Re timing on dummy data: do you mean that in the latest master we also time the loader which requires constructing a dataset? |
Thanks for the update. From looking at the screenshots, it seems that you are still using an earlier commit? Could you please retry with latest master? Just to eliminate that as a potential cause of the issue. Here is a screenshot of what we observe: Re dependencies: The versions you are using differ from ours but let's maybe double-check the code version first and then get back to the dependencies. Re dummy data: I agree, constructing the loader can be annoying. We should probably add a flag for loader timing. Let's address this in a different issue. |
Thanks for trying with master. I think that timing without the loading code should be fine. The code for computing training timings uses the train batch size by default. Using the config and the command above the train batch size ends up being 1024 on 1 GPU which likely causes the OOM issue. I should have included As a sanity check, could you check that the batch size used for eval timing is as expected? (e.g. print the size of inputs here). Also, could you print the timing per iteration? (e.g. print |
Thanks for the information. The batch size is as expected. The timings also seem stable and consistent across iterations. I don't have any other ideas on what else to try on this front. Let's maybe check the dependencies next? The timings I posted above were computed using PyTorch 1.4.0, CUDA 10.1, and cuDNN 7.6.3. Would it be possible to compute the timings using the same dependency versions on your side? As an additional data point, we computed the timings on P100 GPU and observed 55ms (the command and the environment are the same as above; the only difference is P100 vs V100): Also, do you have anything else running on that GPU or machine? It may be good to ensure that this is the only thing running on the system in case there is some overhead / interference. |
Thanks for the update. Glad to see that the timings match on a fresh server. Sounds good, let's revisit if the issue reappears in the future. |
Hi - I appreciate there's already an open issue related to speed, but mine is slightly different.
When I run
python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-1.6GF_dds_8gpu.yaml
having changed GPUS: from 8 to 1, I get the following dump. I am running this on a batch of size 64, with input resolution 224x224, on a V100, as stated in the paper.
This implies a forward pass of ~62ms, not the 33ms stated in MODEL_ZOO. Have I done something wrong? Not sure why the times are so different. The other numbers (acts, params, flops) all seem fine. The latency differences are seen for other models as well - here is 800MF (39ms vs model zoo's 21ms):
I am using commit a492b56, not the latest version of the repo, but MODEL_ZOO has not been changed since before this commit. This is because it is useful being able to time the models on dummy data, rather than having to construct a dataset. Would it be possible to have an option to do this? I can open a separate issue as a feature request for consideration if necessary.
The text was updated successfully, but these errors were encountered: