-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How long does the LfI training take? #1
Comments
Thank you for your attention.
We conduct our experiments on a V100 128G GPU and it takes about 2 hours to train an epoch. The training is indeed a little time-consuming, but I am afraid there is something wrong with your configurations.
Best Regards
…------------------ 原始邮件 ------------------
发件人: "lishiqianhugh/LfID" ***@***.***>;
发送时间: 2023年1月12日(星期四) 上午10:37
***@***.***>;
***@***.***>;
主题: [lishiqianhugh/LfID] How long does the LfI training take? (Issue #1)
Hi.
Thank you for your excellent work!
I ran the LfI training script using an A100 40GB GPU.
The training script takes 6 hours in each epoch.
Consequently, the training script may take two and a half days.
Is it correct behavior?
Thanks.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thank you for getting back to me. You said, "a V100 128GB GPU". In addition, could you run the LfI training script after cloning this repository? |
Sorry for the confusion with anthor project. We use a "A100 80G GPU" to train LFI. For ViT, it takes 2 hours for an epoch but for Swin Transformer, it takes about 5 hours. It is slow because the data is loaded online from the simulator.
…------------------ 原始邮件 ------------------
发件人: "lishiqianhugh/LfID" ***@***.***>;
发送时间: 2023年1月12日(星期四) 晚上7:07
***@***.***>;
***@***.******@***.***>;
主题: Re: [lishiqianhugh/LfID] How long does the LfI training take? (Issue #1)
Thank you for getting back to me.
You said, "a V100 128GB GPU".
Is this the typo of "a V100 32GB GPU"?
In addition, could you run the LfI training script after cloning this repository?
After that, please tell me the training time in each epoch.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
font{
line-height: 1.6;
}
ul,ol{
padding-left: 20px;
list-style-position: inside;
}
After running phyreo.py, only a .pt dataloader is obtained. Running with this .pt dataloader will only save the time of generating it which cannot accelerate training a lot. We try to collect all the data we need in a folder to support offline training but it's too large.
***@***.***
…---- Replied Message ----
From
***@***.***>
Date
1/12/2023 19:38
To
***@***.***>
Cc
***@***.***>
,
***@***.***>
Subject
Re: [lishiqianhugh/LfID] How long does the LfI training take? (Issue #1)
I understand.
I have another question.
In https://github.com/lishiqianhugh/LfID/blob/main/LfI/LfI.md, you described the Prepare dataset section.
Would I run the training script after running python dataset/phyreo.py?
I'm confused about faster training.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Hmm... How many mini-batch sizes did you use? |
We use a batch size of 256.
…------------------ 原始邮件 ------------------
发件人: "lishiqianhugh/LfID" ***@***.***>;
发送时间: 2023年1月12日(星期四) 晚上7:48
***@***.***>;
***@***.******@***.***>;
主题: Re: [lishiqianhugh/LfID] How long does the LfI training take? (Issue #1)
Hmm...
How many mini-batch sizes did you use?
Just now, the training script takes 5.5 hours though I re-run the training script using the default config.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
I updated the mini-batch size from 128 to 256. Anyway, I ran the training script using SwinTransformer. Do you come up with a cause? |
Oh! I understand the cause. |
font{
line-height: 1.6;
}
ul,ol{
padding-left: 20px;
list-style-position: inside;
}
Do you use the saved .pt dataloader? Every time you change your configurations, you have to save a new dataloader.If it is not about this problem, please considering using more CPU kernels to accelerate data loading.
***@***.***
…---- Replied Message ----
From
***@***.***>
Date
1/12/2023 20:23
To
***@***.***>
Cc
***@***.***>
,
***@***.***>
Subject
Re: [lishiqianhugh/LfID] How long does the LfI training take? (Issue #1)
I updated the mini-batch size from 128 to 256.
However, the training script also takes 5.5 hours in each epoch.
Anyway, I ran the training script using SwinTransformer.
However, the Swin training script also needs 6.5 hours.
There is a slight difference between ViT and Swin.
Do you come up with a cause?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Hi.
Thank you for your excellent work!
I ran the LfI training script using an A100 40GB GPU.
The training script takes 6 hours in each epoch.
Consequently, the training script may take two and a half days.
Is it correct behavior?
Thanks.
The text was updated successfully, but these errors were encountered: