We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
Thank you for the great work!
For stage 4 (instruction tuning with HD data), the current code seems to resize/crop image to 224x224: https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/scripts/videochat_mistral/config_7b_hd_stage4.py#L21 https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/dataset/__init__.py#L73
which means it's actually using 224x224 frames for training. Is that true? If so, what is this "HD" about? Or did I miss something?
Thank you!
The text was updated successfully, but these errors were encountered:
224 is the input resolution of our vision encoder. You can refer to the dynamic resolution setting of HD
Ask-Anything/video_chat2/scripts/videochat_mistral/config_7b_hd_stage4.py
Lines 85 to 90 in c3f0798
Sorry, something went wrong.
No branches or pull requests
Hello,
Thank you for the great work!
For stage 4 (instruction tuning with HD data), the current code seems to resize/crop image to 224x224:
https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/scripts/videochat_mistral/config_7b_hd_stage4.py#L21
https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/dataset/__init__.py#L73
which means it's actually using 224x224 frames for training. Is that true? If so, what is this "HD" about? Or did I miss something?
Thank you!
The text was updated successfully, but these errors were encountered: