diff --git a/CONTRIBUTE.md b/CONTRIBUTE.md new file mode 100644 index 00000000..963e7155 --- /dev/null +++ b/CONTRIBUTE.md @@ -0,0 +1,26 @@ +# Contribution guide + +video2dataset open contributions to add new features, improve efficiency or improve the code health. + +## How to validate your changes ? + +Before merging a change (especially for non trivial changes), we ask to: + +* make sure the linting is passing, you can run `make black` and `make lint` locally and then check the status in a PR +* make sure the existing tests are passing, you can run `make test` locally and then check the status in a PR +* add new tests for new features or for bug fixes +* run manually an efficiency test. video2dataset must remain fast so this is important + +## Efficiency test + +To test the efficiency of video2dataset, you can follow [this example to download webvid](dataset_examples/WebVid.md) + +Using 16 processes with 16 threads each is particularly important to check the speed. Enabling wandb is also important. + +You can run with only the `results_2M_val` to reduce the run time of this test. + +You should observe 14.4 videos/s/core in wandb. + +Please post the wandb link in the PR to show this is working. It will make it faster for the reviewer to merge the PR. + + diff --git a/README.md b/README.md index 6d893dc5..2e55c9ff 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,8 @@ Easily create large video dataset from video urls. Can download and package 10M If you believe in making reusable tools to make data easy to use for ML and you would like to contribute, please join the [DataToML](https://discord.gg/ep8yUUtCnp) chat. +If you would like to contribute to video2dataset, please read [CONTRIBUTE.md](CONTRIBUTE.md) + ## Install ```bash diff --git a/pull_request_template.md b/pull_request_template.md new file mode 100644 index 00000000..dfdd61c8 --- /dev/null +++ b/pull_request_template.md @@ -0,0 +1 @@ +* [ ] I have read CONTRIBUTE.md \ No newline at end of file