[Project Page
] [arXiv
]
- Robustness Eval code and scripts
- Full pretrain and finetune scripts
- Code Release
To install all the requirements, simply do:
pip intall -r requirements.txt
This provided torch version in our training, other versions of torch and torchvision are likely to work.
We provide code and scripts for generating offline data in generate_code
folder. Please refer to Generation Scripts for detailed instructions.
To use offline generated dataset and other downloaded data for training, generate csv file and put them in train_code/Annotations
folder. Example for generated dataset and real dataset are synthetic.csv and ucf101_train.csv.
For simple progression, we support training while generating data on the fly. For example, to train with moving circles, do
cd train_code
sh scripts/ucf101/moving_circle.sh
Note you should first fill in the bash scripts with your log path and change parameters to your preference.
For more complex data (e.g. w/ affine transform, moving textures and image clips), we suggest generating offline dataset. The training process strictly follows the VideoMAE. Prepare datasets and place the csv, fill in the bash scripts and then run
cd train_code
sh scripts/ucf101/pretrain/train_dataset.sh
For finetuning with downstream dataset, prepare datasets following VideoMAE and place csv in train_code/Annotations
folder. Fill in corresponding bash scripts and run
cd train_code
sh scripts/ucf101/finetune/ft_hmdb.sh
Additionaly, we use linear probe and corrupted pertubation to eval the quality and robustness of learned representation.
preparation is same as finetune process. Then run
cd train_code
sh scripts/ucf101/finetune/LP.sh
Scripts to be released soon!
We thank Amil Dravid and Ren Wang for their valuable comments and feedback on our paper; and thank UC Berkeley for the computational support to perform data processing and experiments. YG is supported by the Google Fellowship.
We thank the contributors to the following open-source projects. Our project is impossible without the inspirations from these excellent researchers.
If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:
@article{yu2024learning,
title={Learning Video Representations without Natural Videos},
author={Yu, Xueyang and Chen, Xinlei and Gandelsman, Yossi},
journal={arXiv e-prints},
pages={arXiv--2410},
year={2024}
}