You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that this project successfully implements diffusion-based motion generation using only 50 timesteps for both training and sampling, which is significantly fewer than the 1000 steps used in traditional DDPM/DDIM approaches.
I'm very interested in understanding the theoretical foundation and implementation details behind this choice, as I couldn't find any references to training with such reduced timesteps in the original DDPM/DDIM papers.
Could you please share:
The research paper or theoretical work this implementation is based on?
If this is a novel approach, what modifications were made to enable stable training with reduced timesteps?
Any empirical observations or ablation studies that led to choosing 50 as the optimal number of timesteps?
I’d really appreciate your help!
The text was updated successfully, but these errors were encountered:
This is rather an empirical result. Sweeping over the number of diffusions, 50 is the minimum that does not degrade FID. I invite you to try it yourself.
If you look for more intuition, 1000 steps is the typical number that was proven effective for images, but it seems that for the motion domain, a more coarse quantization of the noise space is sufficient.
After I posted this question, I carefully read DDIM.
Table 1 of DDIM has the following description.
That is, the row with η=1.0 reflects the result of learning with the diffusion steps setting of that column in the image domain. In the CIFAR10 dataset, the result of T=100 is 5.78, and the FID is only 1.05 points worse than T=1000.
And when I looked at the diffusion steps of other research on image generation, I found some research that set T=250, 100.
So, I found that it is empirically known that it is no problem to set T to less than 1000 in image generation.
This also applies to the motion domain.
I understood that in the motion domain, which has much fewer features than images, such a reduction in T would not cause problems like in images.
Thanks to your reply, I understand that reducing the diffusion steps is synonymous with quantization.
I noticed that this project successfully implements diffusion-based motion generation using only 50 timesteps for both training and sampling, which is significantly fewer than the 1000 steps used in traditional DDPM/DDIM approaches.
I'm very interested in understanding the theoretical foundation and implementation details behind this choice, as I couldn't find any references to training with such reduced timesteps in the original DDPM/DDIM papers.
Could you please share:
I’d really appreciate your help!
The text was updated successfully, but these errors were encountered: