Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set language info for dash audio streams and sort #5094

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GTechAlpha
Copy link

@GTechAlpha GTechAlpha commented Nov 25, 2024

Set audio track info, if present, in dash manifest

  • language id
  • language display name
  • main/default track

Sort audio formats so that main/default is first (for clients not using dash)

This resolves issue of incorrect/undesired audio when automatic translation is enabled by content creator (currently relatively rare, but likely to increase), and allows dash clients to correctly select default language and optionally offer multiple language streams.

closes #2007

Note: this should be a non-breaking change; if audio track info is not available, the behavior does not change from current

  - language id
  - language display name
  - main/default track
Sort audio formats so that main/default is first (for clients not using dash)

* Note: this should be a non-breaking change; if audio track info is not availablle, the behavior does not change from current
@GTechAlpha GTechAlpha requested a review from a team as a code owner November 25, 2024 19:38
@GTechAlpha GTechAlpha requested review from unixfox and removed request for a team November 25, 2024 19:38
@unixfox
Copy link
Member

unixfox commented Nov 25, 2024

Hello, thank you for the PR that's really interesting. If this solves #2007, could you add close #2007 in your PR description?

# Different representations of the same audio should be groupped into one AdaptationSet.
# However, most players don't support auto quality switching, so we have to trick them
# into providing a quality selector.
# See https://github.com/iv-org/invidious/issues/3074 for more details.
xml.element("AdaptationSet", id: i, mimeType: mime_type, startWithSAP: 1, subsegmentAlignment: true, label: fmt["bitrate"].to_s + "k") do
xml.element("AdaptationSet", id: i, mimeType: mime_type, startWithSAP: 1, subsegmentAlignment: true, label: displayname, lang: lang) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of different representations of the same language you'll end up with two AdaptationSet with the same label and lang. I think that might confuse some downstream clients (and users) if two audio tracks end up having the same label and language.

For example see the dash manifest for this video with this PR https://youtube.comwatch?v=urevinis_PU

Copy link
Author

@GTechAlpha GTechAlpha Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello.

If I am not misunderstanding your statement, yes that could happen if audio streams weren't being filtered for mime type "audio/mp4".

However, since they are being filtered this way, I have only seen one unique language representation per this mime type, from YT.

With this PR and the current state of Inv and YT, I have not experienced the issue you describe. And in the case of your example, no audio stream selection options are provided (by video.js and in my case) because there is no audio track information from YT, so the lang attributes are set to "und" and the client falls back to the default behavior of auto-selecting the first stream. When audio track info is available, video.js provides only one audio option per language in all of my testing.

Thank you for your time and attention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just about video.js. There are downstream clients that use other media players which also rely on the dash manifests generated by Invidious.

Copy link
Member

@syeopite syeopite Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also concerned that this could reintroduce issue #3074.

Although the first AdaptationSet should always have the highest bitrate, YouTube's structure is very finicky and subject to change. Even if video.js will always select the first stream it doesn't mean that it'll always have the highest bitrate.

PR #3162 also seemed to introduce a quality selector for audio but it seems to have broke somewhere down the line. If it was ever fixed, this issue will present a UX problem even in Invidious and video.js with the labels being simply "Unknown"

This comment was marked as off-topic.

@unixfox
Copy link
Member

unixfox commented Dec 19, 2024

What's the way forward with this? What is the API that needs to be maintained for downstream clients? Does it just need some kind of unique ID?

There is also, of course, the possibility of doing a breaking change. We could argue that if clients can't support this any more then they need updating. Invidious is becoming more and more useless and more and more videos come with some AI-generated dub track.

@georgek

Well first, you are not commenting in the right section. This is a comment section about a code change in the PR.

Second, This PR is about the DASH manifest, not the API that is used by the 3rd party clients.

If you are requesting the API using the /api/v1/videos/, then it's up to your client to handle the fact that there are multiple audio files with different languages. It's possible that we do not expose enough info to differentiate between each one, but if it's that the case, then please open a new GitHub issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Add support for multiples audio tracks
4 participants