You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If YoutubeDL fails to fully download a video, often times it leaves .part and .part-FragNN files around in the tmp directory. For large datasets, these can consume a significant amount of disk space, unless you have something cleaning the files.
The code deletes the actual video .mp4 in exception handling, but often times the transfer was not completed and the actual .mp4 file has not been created yet:
Please consider adding cleaning of the .part(-Frag*) files in case of exception. Since an unique name (UUID) is used for the file, it can't be used if another attempt to run video2dataset is done later. But if some retry logic is added to downloading (which would be nice actually), then I believe yt-dlp might be able to reuse the temporary parts and resume download (I believe that's why yt-dlp itself does not clean up them, but I am not entirely sure if such resume is possible).
The text was updated successfully, but these errors were encountered:
If YoutubeDL fails to fully download a video, often times it leaves .part and .part-FragNN files around in the tmp directory. For large datasets, these can consume a significant amount of disk space, unless you have something cleaning the files.
The code deletes the actual video .mp4 in exception handling, but often times the transfer was not completed and the actual .mp4 file has not been created yet:
video2dataset/video2dataset/data_reader.py
Lines 230 to 232 in 83afef0
Please consider adding cleaning of the .part(-Frag*) files in case of exception. Since an unique name (UUID) is used for the file, it can't be used if another attempt to run video2dataset is done later. But if some retry logic is added to downloading (which would be nice actually), then I believe yt-dlp might be able to reuse the temporary parts and resume download (I believe that's why yt-dlp itself does not clean up them, but I am not entirely sure if such resume is possible).
The text was updated successfully, but these errors were encountered: