Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about your work #11

Open
nihaomiao opened this issue Jul 31, 2023 · 1 comment
Open

Some questions about your work #11

nihaomiao opened this issue Jul 31, 2023 · 1 comment

Comments

@nihaomiao
Copy link

nihaomiao commented Jul 31, 2023

Hi, @Tobi-r9, thanks a lot for your interesting work! I have some questions about your work.

  1. For the pre-trained models you released, what is the default value of output image size? In your readme, you set the image size to 64 for all the models during training. I am wondering whether your pre-trained models can be used to generate video with the size of 256*256.
  2. Do your model allow class-conditioned generation? I find that your code seems to allow the input of extra class labels. I am wondering whether you try the video generation conditioned on both given images and class labels.
  3. The training/testing split. Could you show the training/testing split for each dataset?
  4. The implementation of resampling may be incorrect. As mentioned in your paper and Repainting, A resampling step is to add one-step noise and then de-noise. Your function forward_diffusion is designed to add Gaussian noise of timestep i to the x_start. In your implementation resampling, you use forward_function add Gaussian noise of timestep i to the img, i.e., $x_{t-1}$, which may just generate a strange result. Could you double-check whether my understanding is correct?
@nihaomiao nihaomiao changed the title What is the default value of image size for pre-trained models? Some questions about your work Jul 31, 2023
@Tobi-r9
Copy link
Owner

Tobi-r9 commented Aug 3, 2023

Hi,
thanks for your interest in our work.

  1. The model itself can only generate 64x64 frames, however, you could use some kind of super-resolution model to increase the resolution frame by frame.
  2. The original code from openai does allow class conditional generation, however, we have not experimented with it. Some minor fixes might be necessary to make it work properly, but it should not be too much work I guess.
  3. The train and test splits are pre-defined for each dataset. Check for kinetics, Bair and UCF-101.
  4. Thank you for letting me know. I will check and get back to you.

I hope this helps :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants