Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evolved Video Encoding with WebCodecs #71

Open
sprangerik opened this issue Sep 11, 2024 · 1 comment
Open

Evolved Video Encoding with WebCodecs #71

sprangerik opened this issue Sep 11, 2024 · 1 comment
Labels
session Breakout session proposal track: Real-time Web

Comments

@sprangerik
Copy link

sprangerik commented Sep 11, 2024

Session description

WebCodecs provides a low-level API to do encoding and decoding of video with control over settings on a per-frame basis. As a relatively young API it currently lacks some more advanced features, such as temporal/spatial scalability that are important for real-time use cases like video conferencing.

This session is intended to discuss a number of potential next steps, to find which features are highest priority, and what benefits or problems we face with each of those.

Some of the topics for discussion:

Explicit reference frame control

By allowing the user to specify which reference buffers to reference and which to update on a per-frame basis, it is possible to implement a number of important reference structures and coding features including temporal/spatial/quality layers, long-term references, low-latency 2-pass rate control, etc.

In short, any of the scalability modes listed in Scalable Video Coding (SVC) Extension for WebRTC, any many more can be implemented with a small set of tools. If done right, this could even be done in a manner that is codec and implementation agnostic.

This way of modeling an encoder does also present some issues. The user needs to be able to determine how many reference buffers are available, how many can be referenced per frame and know which references are allowed or disallowed based on various circumstances. How do we expose such data in a way that is both user friendly, compatible with the current API, and avoids unnecessary finger printing surfaces?

There are also tradeoffs when it comes to integrating with existing encoder implementations, a small subset of which may not fit well into this model.

Spatial/Quality Scalability

Spatial scalability can be achieved by changing the encode call to take a sequence of encoding options, instead of a single option, per input frame. Each option would then represent a different layer and would include a desired encoded resolution. With reference frame scaling, a user may reference a buffer containing a different resolution.

Again, this comes with some challenges. Different codec types might have different bounds on the scaling factors, and even certain implementations have limitations in this regard - if it is supported at all. Some codecs allow only reference frame scaling within the same temporal unit, while other support any reference at any time. How do we handle encoders with special optimized mode such as "multi-res" or "S-mode aware" encoding?

Rate Control

When dealing with layered encoding, rate control becomes much more involved. The easiest way is to just support CQP, putting all of the rate control control with the user. If CBR is desired, the encoder needs to understand the bitrate target and expected frame rate for each spatio-temporal layer, this means it suddenly needs to be SVC aware even if the user is doing all of the reference frame control.

Auxiliary

There are many other knobs that could potentially be added. Speed/Quality control, segmentation/ROI-mapping, etc
What's on the wish-list of the community?

Other Sessions of Interest

Note that there will also be a first-step proposal discussed at the joint Media/WebRTC WG Meeting on the 26th.

Further, there is a proposed breakout session on RtpTransport, an API that allows users to send custom-encoded frames over the RTP channel of a PeerConnection and is intended to go hand-in-hand with WebCodecs.

Session goal

Find the highest priority features in the community, and what aspects needs more consideration

Additional session chairs (Optional)

@Djuffin

Who can attend

Anyone may attend (Default)

IRC channel (Optional)

#evolved-webcodecs

Other sessions where we should avoid scheduling conflicts (Optional)

#13

Instructions for meeting planners (Optional)

No response

Agenda for the meeting.

The agenda is to discuss the proposal to add reference frame control to WebCodecs, and gather feedback and comments on the path forward. The session consist of a few parts:

  • General goals
  • Our initial proposal, a minimum viable useful implementation of reference control
  • How to query what the encoders are capable of
  • Spatial Scalability (SVC, Simulcast)
  • Rate Control
  • Miscellaneous

See also WebCodecs spec and github issue for the reference control.

Links to calendar

Meeting materials

@sprangerik sprangerik added the session Breakout session proposal label Sep 11, 2024
@tpac-breakout-bot
Copy link
Collaborator

Thank you for proposing a session!

You may update the session description as needed and at any time before the meeting, but please keep in mind that tooling relies on issue formatting: follow the instructions and leave all headings and other formatting intact in particular. Bots and W3C meeting organizers may also update the description, to fix formatting issues or add links and other relevant information. Please do not revert these changes. Feel free to use comments to raise questions.

Do not expect formal approval; W3C meeting organizers endeavor to schedule all proposed sessions that are in scope for a breakout. Actual scheduling should take place shortly before the meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
session Breakout session proposal track: Real-time Web
Projects
Status: No status
Development

No branches or pull requests

3 participants