Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about training with reduced timesteps (50) instead of traditional 1000 steps #234

Closed
shin-wn opened this issue Dec 22, 2024 · 2 comments

Comments

@shin-wn
Copy link

shin-wn commented Dec 22, 2024

I noticed that this project successfully implements diffusion-based motion generation using only 50 timesteps for both training and sampling, which is significantly fewer than the 1000 steps used in traditional DDPM/DDIM approaches.
I'm very interested in understanding the theoretical foundation and implementation details behind this choice, as I couldn't find any references to training with such reduced timesteps in the original DDPM/DDIM papers.

Could you please share:

  1. The research paper or theoretical work this implementation is based on?
  2. If this is a novel approach, what modifications were made to enable stable training with reduced timesteps?
  3. Any empirical observations or ablation studies that led to choosing 50 as the optimal number of timesteps?

I’d really appreciate your help!

@GuyTevet
Copy link
Owner

This is rather an empirical result. Sweeping over the number of diffusions, 50 is the minimum that does not degrade FID. I invite you to try it yourself.

If you look for more intuition, 1000 steps is the typical number that was proven effective for images, but it seems that for the motion domain, a more coarse quantization of the noise space is sufficient.

@shin-wn
Copy link
Author

shin-wn commented Jan 30, 2025

After I posted this question, I carefully read DDIM.

Table 1 of DDIM has the following description.
Image

That is, the row with η=1.0 reflects the result of learning with the diffusion steps setting of that column in the image domain. In the CIFAR10 dataset, the result of T=100 is 5.78, and the FID is only 1.05 points worse than T=1000.
And when I looked at the diffusion steps of other research on image generation, I found some research that set T=250, 100.
So, I found that it is empirically known that it is no problem to set T to less than 1000 in image generation.

This also applies to the motion domain.
I understood that in the motion domain, which has much fewer features than images, such a reduction in T would not cause problems like in images.

Thanks to your reply, I understand that reducing the diffusion steps is synonymous with quantization.

Thank you for replying!!!

@shin-wn shin-wn closed this as completed Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants