Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up tmp part files in case of d/l failure #304

Open
pabl0 opened this issue Jan 31, 2024 · 0 comments
Open

Clean up tmp part files in case of d/l failure #304

pabl0 opened this issue Jan 31, 2024 · 0 comments

Comments

@pabl0
Copy link

pabl0 commented Jan 31, 2024

If YoutubeDL fails to fully download a video, often times it leaves .part and .part-FragNN files around in the tmp directory. For large datasets, these can consume a significant amount of disk space, unless you have something cleaning the files.

The code deletes the actual video .mp4 in exception handling, but often times the transfer was not completed and the actual .mp4 file has not been created yet:

except Exception as e: # pylint: disable=(broad-except)
err = str(e)
os.remove(video_path)

Please consider adding cleaning of the .part(-Frag*) files in case of exception. Since an unique name (UUID) is used for the file, it can't be used if another attempt to run video2dataset is done later. But if some retry logic is added to downloading (which would be nice actually), then I believe yt-dlp might be able to reuse the temporary parts and resume download (I believe that's why yt-dlp itself does not clean up them, but I am not entirely sure if such resume is possible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant