-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trimming down the AudioConfiguration #160
Comments
samplerate and channels are definitely used. Samplerete in particular as on Windows the system AAC decoder only supports a narrow list of samplerate 96kHz in particular isn't supported and will cause a decoding error and this is something seen in the wild. For the channels, I've seen it use with the opus 255 files, where they would query a high number of channels where 1 channel is used for a particular audio object. This allowed to differentiate user agent supporting those files to those who don't. I agree that bitrate is likely unused. As such, removing support for those two fields would have a negative impact. Jean-Yves |
Apologies for something of a ramble ..... Something related to channels is certainly used but perhaps not exactly this .... HbbTV 2.0.3 provides a 3-state value for its sort-of-ish equivalent of this;
As well as this being a 3-state value, the other difference is that this answers a subtly different question - what can be output and not what can be decoded - because the answer to what can be decoded might well be anything or almost anything. What does this mean for MC?
|
Happily noted. Do you assume some default values when these are not provided? We could update the spec to make those defaults explicit.
Cool. If others agree I'll send a PR deprecating that field.
The WebAudio maxChannelCount should work to give the exact number of channels. It doesn't let you say "preferred". Is this for quasi-5.1 sound bars and the like?
I like to confine MC to answering questions about decoding support/perf, letting other APIs answer questions about your display and peripherals. Mostly because the "other APIs" tend to already be somewhat defined (e.g. CSSOM Screen). We let a little rendering sneak in with the spatialRendering attribute. Regrettably I don't think we considered whether that might be more at home in WebAudio, next to channels. (Aside: @jernoble @isuru-c-p - did either of you ship that yet?) |
Following from @jpiesing's feedback, how is spatial audio (e.g., 5.1) rendering handled in browsers today? Presumably checking the AudioContext destination channels will indicate whether downmixing will occur? Could the Audio Output Devices API be used to select between a device's stereo and surround outputs? The phrasing in the MC API spec for |
Thinking about it a little more, some implementations might have a maxiumum bitrate - or at least a maximum bitrate they've been tested at.
I believe there might be a progression of
I have no idea if all of these exist in the real world. I'm not an audio expert. Hopefully I've made a sufficiently serious mistake in the above analysis than an audio expert will step in :)
If it really is the case that (almost) any modern audio library can do some version of a downmix to stereo then the question that needs to be answered is more than just "can audio with a particular set of properties be decoded".
Is an update to WebAudio in any group's charter? If not then putting a "somebody else's problem" label on this issue won't help people who just want to know whether delivering 5.1 or stereo to a particular consumer will give the better user experience. |
@johnsim This is the issue we were discussing. |
Sorry for the delay
Chromium supports 5.1 and even higher. But note that this is not "spatial" as intended by the spec. Spatial refers to the modern object-based surround tech like dtsx or dolby atoms. For these, channel count was insufficient (they can run on top of a number of different channel counts). Chromium currently does not support the codecs used in spatial rendering (e.g. EAC3-JOC). The Chromecast build of chromium does support passthrough of those codecs to audio sinks.
Yep, that should work. Note that chrome does make its mixing decisions when the stream is first loaded. We try to avoid early downmixing, but there are edge cases where you can get stuck if you plug in your hw after starting the stream.
I'm embarrassed to admit that I am only just now aware of that API. This breaks my model that the web assumes you have just one output device. If you can actually have N devices and switch between them on a per element basis, the MC API is a bit weird for ignoring that. In practice, you can implement the API to just return the max capability of all your devices. @jernoble is that what Safari did? In hindsight, putting spatial capabilities on the deviceinfos from that API seems cleaner. I also like that it lets you know when devices change.
I don't intend to dodge the bigger question. I'm suggesting it may already be answered by another means.
I don't think a recharter is needed to ammend that spec. The editors draft of WebAudio is still regularly updated. |
re: rendering capabilities, @johnsim hosted a call with a few folks on this thread. I've made a doc to tee up discussion about possible api shapes. This starts by simply using channels from ISO 23091-3 and a spatialRendering attribute. On the call I heard a mix of opinions about the usefulness of channels, etc. I'm hoping the audio experts will weigh in with suggestions. If channels is insufficient, we should try to define some new primitives without standardizing the use of any particular proprietary audio tech. The doc has public comment/suggestion access. Send me a note if you'd like edit access. |
The AudioWG has rechartered, and we welcomes new issues in this tracker: https://github.com/WebAudio/web-audio-api-v2/issues. Happy to discuss there. We've discussed the problem at length in the past, but a lot more issues were of higher importance (in the sense that what was shipping in browsers was not even really specified, and it took precedence). WebAudio/web-audio-api#1089 has some context, and the rationale for the current setup. But in particular, note WebAudio/web-audio-api#1089 (comment), and suggestions to extend the current state, from domain experts here: WebAudio/web-audio-api#1089 (comment). Unfortunately, some links are now 404. It is somewhat possible to know what the default audio output device supports today, via code like this: var ac = new AudioContext;
console.log(ac.destination.maxChannelCount); // returns 6 for a 5.1 setup Normative references:
Script always sees the ordering specified in [0], regardless of the codec/container, it's remapped before being exposed, so that authors can process the audio without caring about the codec or the audio output device. This is a very cheap operation because the data is planar (essentially shuffling a few pointers). When the data reaches the The proposals from @chcunningham is, to me, vastly superior to the approach we have currently in all aspects (which is just a number -> fixed layout mapping). However, referencing non-freely available documents from freely available W3C specifications is a bit annoying. This was also discussed in the links above, we can probably consult with @svgeesus who happens to be the W3C contact for the Audio Working Group and knows the options we have here. |
Yes, the Audio WG (current charter), |
It is, and we prefer to avoid it where possible. But in some cases we do end up with a paywalled reference as the normative one. What we do in some cases is to add informative material such that developers without deep pockets are not at a disadvantage in terms of implementation. |
@jyavenard said:
@chcunningham said:
@jpiesing said:
So on that basis I suggest that we don't deprecate channels, bitrate, or samplerate. The main question remaining is rendering (as opposed to decoding) capabilities: https://docs.google.com/document/d/1to7llKOyNZxirnpCahsslKnUazQZnT2mEVfRaWbCss0/edit |
Related issue #206. |
Most of the fields in AudioConfiguration were added in the first draft of the spec. But, at least in Chromium, we don't have much use for some of the optional parameters including:
In Chromium, if we support the codec profile, we will support decoding for any number of channels and any bitrate and samplerate. The codec definition may itself impose some limits, but we don't need an API to surface those (encodings that exceed those limits, if they exist, would simply be "bad content").
VideoConfiguration has similar fields (bitrate, framerate, etc...) which generally don't make/break support. But these fields are useful when making predictions about playback smoothness. The same can't be said for audio, which is cheap to decode, and always marked smooth and powerEfficient (at least in Chromium).
So, question to other implementers (@jernoble @eric-carlson @jyavenard @padenot @vi-dot-cpp @aboba @jpiesing): do you find the above AudioConfiguration inputs useful? Should we deprecate?
The text was updated successfully, but these errors were encountered: