Learning Video Representations without Natural Videos

Xueyang Yu, Xinlei Chen, Yossi Gandelsman

[Project Page] [arXiv]

TODO

Robustness Eval code and scripts
Full pretrain and finetune scripts
Code Release

Requirements

To install all the requirements, simply do:

pip intall -r requirements.txt

This provided torch version in our training, other versions of torch and torchvision are likely to work.

Dataset Generation and Preparation

We provide code and scripts for generating offline data in generate_code folder. Please refer to Generation Scripts for detailed instructions.

To use offline generated dataset and other downloaded data for training, generate csv file and put them in train_code/Annotations folder. Example for generated dataset and real dataset are synthetic.csv and ucf101_train.csv.

VideoMAE Pretrain

Train VideoMAE on the fly

For simple progression, we support training while generating data on the fly. For example, to train with moving circles, do

cd train_code
sh scripts/ucf101/moving_circle.sh

Note you should first fill in the bash scripts with your log path and change parameters to your preference.

Train VideoMAE with offline data

For more complex data (e.g. w/ affine transform, moving textures and image clips), we suggest generating offline dataset. The training process strictly follows the VideoMAE. Prepare datasets and place the csv, fill in the bash scripts and then run

cd train_code
sh scripts/ucf101/pretrain/train_dataset.sh

VideoMAE Finetune

For finetuning with downstream dataset, prepare datasets following VideoMAE and place csv in train_code/Annotations folder. Fill in corresponding bash scripts and run

cd train_code
sh scripts/ucf101/finetune/ft_hmdb.sh

More Representation Evaluation

Additionaly, we use linear probe and corrupted pertubation to eval the quality and robustness of learned representation.

Linear Probe

preparation is same as finetune process. Then run

cd train_code
sh scripts/ucf101/finetune/LP.sh

Robustness Eval

Scripts to be released soon!

Acknowledgements

We thank Amil Dravid and Ren Wang for their valuable comments and feedback on our paper; and thank UC Berkeley for the computational support to perform data processing and experiments. YG is supported by the Google Fellowship.

We thank the contributors to the following open-source projects. Our project is impossible without the inspirations from these excellent researchers.

Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

@article{yu2024learning,
  title={Learning Video Representations without Natural Videos},
  author={Yu, Xueyang and Chen, Xinlei and Gandelsman, Yossi},
  journal={arXiv e-prints},
  pages={arXiv--2410},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assert		assert
generation_code		generation_code
train_code		train_code
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Video Representations without Natural Videos

TODO

Requirements

Dataset Generation and Preparation

VideoMAE Pretrain

Train VideoMAE on the fly

Train VideoMAE with offline data

VideoMAE Finetune

More Representation Evaluation

Linear Probe

Robustness Eval

Acknowledgements

Citation

About

Releases

Packages

Languages

License

Unicorn53547/Synthetic-Video-Representations

Folders and files

Latest commit

History

Repository files navigation

Learning Video Representations without Natural Videos

TODO

Requirements

Dataset Generation and Preparation

VideoMAE Pretrain

Train VideoMAE on the fly

Train VideoMAE with offline data

VideoMAE Finetune

More Representation Evaluation

Linear Probe

Robustness Eval

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages