Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about VideoChat2_HD #194

Open
LiJiaqi96 opened this issue Jun 12, 2024 · 35 comments
Open

Questions about VideoChat2_HD #194

LiJiaqi96 opened this issue Jun 12, 2024 · 35 comments

Comments

@LiJiaqi96
Copy link

Hi, thanks for your update of VideoChat2_HD! When trying the newly-released code, I got some questions:

  • The MetaLoader_rs class in "train_it_ds.py" seems to be missing.
  • So I still used "train_it.py", but got the following error. I'm not sure whether it could be solved by using MetaLoader_rs.
RuntimeError: stack expects each tensor to be equal size, but got [8, 3, 224, 448] at entry 0 and [8, 3, 448, 672] at entry 1
  • Then I changed the batch_size to 1 and solved the previous error. But it seems the load_and_transform_media_data_image function does not have dynamic_config setting, which is passed to it in "it_dataset_mistral.py". I created a pull request to modify this part.
  • Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.
@Andy1621
Copy link
Collaborator

Thanks for your try! I will fix it later~

@Andy1621
Copy link
Collaborator

@LiJiaqi96 Please have a try. have updated the code. The train_it_ds is add with deepspeed and need some change.

@LiJiaqi96
Copy link
Author

Thanks! I tried "train_it_ds.py" without using deepspeed, but it doesn't work. Is it possible to train without using deepspeed? Temporally I prefer not to use deepspeed.

@Andy1621
Copy link
Collaborator

Yes! You can run it without deepspeed. BTW, show me you log so that I can fix the bug ~

@LiJiaqi96
Copy link
Author

Sorry for the late reply. The log is here
train_log.txt
in "config_7b_hd_stage4.py", I set enable=False in deepspeed settings.
and run the code with:

torchrun    --nnodes=${NNODE} --nproc_per_node=${NUM_GPUS} \
    --rdzv_endpoint=${MASTER_NODE}:10068 \
    --rdzv_backend=c10d \
    tasks/train_it_ds.py \
    $(dirname $0)/config_7b_hd_stage4.py \
    output_dir ${OUTPUT_DIR}

@Andy1621
Copy link
Collaborator

Andy1621 commented Jun 13, 2024

I'm not sure whether it is cause by the deepspeed or pytorch verisons.
Here are my versions of different packages:

torch                     1.13.1+cu117
torchaudio                0.13.1+cu117
torchnet                  0.0.4
torchvision               0.14.1+cu117
deepspeed                 0.14.2
transformers              4.40.1

BTW, sometimes you can fix the bug by change find_unused_parameters to True or Fasle.

@LiJiaqi96
Copy link
Author

Thanks, I will create an environment with exactly the same packages and have a try.

@yuanrr
Copy link

yuanrr commented Jun 13, 2024

Hi, I found shared_utils_ds.py has a bug in line 58.

optimizer_params = create_optimizer(config.optimizer, model, return_group=True)

the optimizer.py may need to be updated.

@Andy1621
Copy link
Collaborator

Thanks for your feedback. I have updated the code.

@LiJiaqi96
Copy link
Author

I used the new environment except flash-attn, as I used CUDA 12.1 and can only use flash-attn==2.1.0. I ran the code "scripts/videochat_mistral/run_7b_stage4_hd.sh", with "tasks/train_it.py" and deepspeed enable=False, then got error train_log0618.txt. The error seems to be caused by flash-attn.
Is it possible to run videochat2_hd using the same environment as videochat2_mistral, withou using deepspeed?

@LiJiaqi96
Copy link
Author

BTW I test to run the code on single GPU (like python train_it.py) and it iterates normally

@Andy1621
Copy link
Collaborator

Yes, it's okay to use it without deepspeed. I use deepspeed ZERO to decrease the GPU memory~

@LiJiaqi96
Copy link
Author

I see. Is it ok for you to run on multiple GPUs without deepspeed, just as the model runs in videochat2_mistral?

@LiJiaqi96
Copy link
Author

Update: I managed to solve the previous issue by upgrading the flash-attn to 2.5.9. When I use "train_it_ds.py" and with deepspeed enable=True, I met new issue about deepspeed config:
trainlog_0621.txt
Could you please help me solve that?

@Andy1621
Copy link
Collaborator

Hi! Please try again with the newly commit.

@LiJiaqi96
Copy link
Author

Thanks for your update! Now the code could run with deepspeed enabled.
BTW, Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

@Andy1621
Copy link
Collaborator

Almost all the datasets can be directly downloaded from their repos or homepages~

Give me feedback if you don't find them.

@LiJiaqi96
Copy link
Author

new_IT_videos
In "instruction_data.py", there are some newly added image datasets in M3IT, and some newly added videos datasets. Is there any place to find those video datasets? Thanks!

@Andy1621
Copy link
Collaborator

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

@LiJiaqi96
Copy link
Author

Thanks for your sharing!

@LiJiaqi96
Copy link
Author

Another question, how could I obtain the checkpoint after VideoChat2_HD training? in "demo_mistral_hd.ipynb".
state_dict = torch.load("your_model_path/videochat2/videochat2_hd_mistral_stage4.pth", "cpu")
I noticed that there are several files in the "ckpt_latest.pth" folder, should I choose one of them?
Thanks!

@LiJiaqi96
Copy link
Author

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, could you please help me find the instruction json files such as f"{anno_root_it}/video/caption/sharegptvideo/train_300k.json", I did not find the json files in the HF VideoChat2-IT repo.

@Andy1621
Copy link
Collaborator

Sorry for the late reply. For the checkpoint, you need to use the file named mp_xxx which saves weights. For the instruction data, I will upload it today.

@Andy1621
Copy link
Collaborator

@LiJiaqi96 Please check the data in HuggingFace~

@LiJiaqi96
Copy link
Author

Thanks for your reply! I will try it~

@LiJiaqi96
Copy link
Author

BTW, did you evaluate the effectiveness of the VideoChat2_HD and the newly added datasets, respectively? I'm curious about whether the training scheme or the dataset matters more for the improvement. Thanks!

@Andy1621
Copy link
Collaborator

Andy1621 commented Jul 1, 2024

We do not conduct serious comparisons since we want to make good use of pretrained models.

And I think both are important based on some experiments:

  • Stage4: Directly fine-tuning VideoChat2-Stage3 with HD on the original Stage3-dataset improved marginally.
  • Stage3: Fine-tuning VideoChat2-Stage2 with Stage4-dataset leads to performance drop by ~3%.

@LiJiaqi96
Copy link
Author

My experiment is consistent with your findings. I directly fine-tuning VideoChat2-Stage3 (trained by myself from Stage2, 3 epochs) with HD on the original Stage3-dataset (1 epoch), and the score on the MVBench drops from 56 to 43 ...

@Andy1621
Copy link
Collaborator

Andy1621 commented Jul 3, 2024

Interesting! I think HD needs more high-resolution and high-quality data.

@LiJiaqi96
Copy link
Author

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, while downloading the datasets, I could not find the "infovqa". Could you please help me find the dataset?

@LiJiaqi96
Copy link
Author

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, while downloading the datasets, I could not find the "infovqa". Could you please help me find the dataset?

Seems to be this dataset[https://www.docvqa.org/datasets/infographicvqa]

@LiJiaqi96
Copy link
Author

Hi, I noticed that the number of DiDeMo videos provided in the json file cannot match with the google drive version. Is there any way to download the full set of DiDeMo videos? Thanks !
https://drive.google.com/drive/u/0/folders/1huOL37wNOyMdCzbl8CIvJHDwCu5HLQ5o

@LiJiaqi96 LiJiaqi96 reopened this Aug 31, 2024
@Andy1621
Copy link
Collaborator

Hi! I do not know the way to download DeDeMo, since it was downloaded in our cluster~

@LiJiaqi96
Copy link
Author

Thanks for your reply

@LiJiaqi96
Copy link
Author

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, I downloaded the videos from ShareGPTVideo as the link provided above. When I ran the code, there are errors that many files could not be found, such as: v_qx1FNJxiUuE-Scene-001, 1023599998, v_kuJO1VapxuQ-Scene-027. Did you use the "train_300k" subset of ShareGPTVideo? Thanks!

@LiJiaqi96 LiJiaqi96 reopened this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants