-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YouTube metadata is not saved #319
Comments
Are you getting empty yt_meta_dict for just some videos or all of them? What I am is seeing, that for every 300 videos I seem to get roughly 100 videos with yt_meta_dict populated and 200 videos with yt_meta_dict = {}, which is quite strange. What exactly does ignoring errors in yt_dlp mean? Even if you have retries, it gives up on the first try?
Other yt_dlp codepaths don't seem to set this. |
Ahh! Now I understand what happens: with multiple clips, only the first one (_00000.json) will have yt_meta_dict populated, not the following clips. It seems this was a change introduced by clipping subsampler refactoring (#275), did it behave differently in v1.2.0? video2dataset/video2dataset/subsamplers/clipping_subsampler.py Lines 181 to 183 in 28e7d1c
I am not sure if this is a good idea. Depending on your processing pipeline, you might want to have the same metadata available on all the clips. |
I agree duplicating the metadata makes more sense especially given the size
of the data
…On Thu, Mar 7, 2024, 12:33 PM Henrik Ahlgren ***@***.***> wrote:
Ahh! Now I understand what happens: with multiple clips, only the first
one (_00000.json) will have yt_meta_dict populated, not the following clips.
It seems this was a change introduced by clipping subsampler refactoring (
#275 <#275>), did it behave
differently in v1.2.0?
https://github.com/iejMac/video2dataset/blob/28e7d1c851a2298f3a75375f6e324950405987e7/video2dataset/subsamplers/clipping_subsampler.py#L181-L183
I am not sure if this is a good idea. Depending on your processing
pipeline, you might want to have the same metadata available on all the
clips.
—
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437QGWBOFA5L5DCYENX3YXBGB7AVCNFSM6AAAAABDRVKHRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTGMYTSMJRGA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Issue
When using video2dataset (1.3.0) to download youtube videos i've set the following entry in the config to retrieve meta data:
But in the resulting json files the entry
"yt_meta_dict": {}
, is empty even thoughget_info: True
in the config.How to reproduce
For example this link: https://www.youtube.com/embed/JFUsP1coIKM
When i download that with yt-dlp:
I get youtube meta data like
"categories": ["Entertainment"], "tags": ["Deutsche", "Welle", "Made", "in", "Germany", "Bio", "Lettland", "Getreide"]
But with video2dataset it looks like this:
The text was updated successfully, but these errors were encountered: