Take into account RTP packets header sizes when calculating BWE and pacing #581

alexlapa · 2024-10-28T22:37:40Z

So while debugging str0m's bandwidth estimation and I've noticed that AckedBitrateEstimator::current_estimate() right here produces consistently lower values then libwebrtc implementation right here for the same RTP stream. I've disabled BWE probing so its not the case. So the difference between implementations is that libwebrtc actually includes RTP header sizes in pacing and bwe calculations, while str0m does not: 1, 2.

So this is how size of packets is calculated when estimating bitrate based on acked packets:

// acknowledged_bitrate_estimator.cc
void AcknowledgedBitrateEstimator::IncomingPacketFeedbackVector(const std::vector<PacketResult>& packet_feedback_vector) {
    // ...
    DataSize acknowledged_estimate = packet.sent_packet.size; // here
    acknowledged_estimate += packet.sent_packet.prior_unacked_data;
    bitrate_estimator_->Update(packet.receive_time, acknowledged_estimate, in_alr_); // this is  AckedBitrateEstimator::update

// transport_feedback_adapter.cc
void TransportFeedbackAdapter::AddPacket(const RtpPacketSendInfo& packet_info,
                                         size_t overhead_bytes,
                                         Timestamp creation_time) {
  packet.sent.size = DataSize::Bytes(packet_info.length + overhead_bytes); // here

// rtp_packet_send_info.cc
RtpPacketSendInfo RtpPacketSendInfo::From(const RtpPacketToSend& packet,
                                          const PacedPacketInfo& pacing_info) {
  packet_info.length = packet.size();

// rtp_packet.h
class RtpPacket {
// ...
  size_t headers_size() const { return payload_offset_; } // payload_offset_ == headers_size
// ...
  size_t size() const {
    return payload_offset_ + payload_size_ + padding_size_; // so its pretty much the whole packet with both header and payload
  }

And here is pacing:

// pacing_controller.cc
void PacingController::ProcessPackets() {
// ...
  DataSize packet_size = DataSize::Bytes(rtp_packet->payload_size() +
                                         rtp_packet->padding_size());

  if (include_overhead_) { // include_overhead_ is true in default libwebrtc build
    packet_size += DataSize::Bytes(rtp_packet->headers_size()) +
                   transport_overhead_per_packet_; // i dont really know what transport_overhead_per_packet_ is, but it is 40 or 50 in my tests
  }
// ...

Another interesting thing is that str0m's AckedBitrateEstimator is based on BitrateEstimator(bitrate_estimator.cc) + AcknowledgedBitrateEstimator(acknowledged_bitrate_estimator.cc), and the implementation that libwebrtc seems to use is RobustThroughputEstimator(robust_throughput_estimator.cc).

It is created here:

// acknowledged_bitrate_estimator_interface.cc
std::unique_ptr<AcknowledgedBitrateEstimatorInterface>
AcknowledgedBitrateEstimatorInterface::Create(
    const FieldTrialsView* key_value_config) {
  RobustThroughputEstimatorSettings simplified_estimator_settings(
      key_value_config);
  if (simplified_estimator_settings.enabled) {
    return std::make_unique<RobustThroughputEstimator>(
        simplified_estimator_settings);
  }
  return std::make_unique<AcknowledgedBitrateEstimator>(key_value_config);
}

And simplified_estimator_settings.enabled == true at least in my libwebrtc build, which is pretty default. Maybe Chromium build uses AcknowledgedBitrateEstimator, havent checked that.

…acing

algesten · 2024-10-29T08:21:54Z

@k0nserv would love your input here

k0nserv · 2024-10-29T09:23:34Z

Hey @alexlapa, Thanks for looking into this.

I know I've spotted code in libWebRTC that made the inclusion of header overhead configurable in these calculations. The current BWE system is based on media payloads only.

The current code here is based on libWebRTC from about February-March 2024, this might explain some differences you are seeing. I would cross check against that version(e.g. commit 14e2779a6ccdc67038ed2069a5732dd41617c6f0).

Overall I'm not against this idea, in fact I think it probably makes more sense, but it's a breaking change and should possibly be configurable.

For a user of str0m an estimate that includes the header overhead is more accurate, but less useful if they are trying to allocate media bitrates. For example, if str0m estimates 2.5Mbit/s(with header overhead) and the user allocates 2.5Mbit/s of media they'll end up overusing. We need to figure out how this will work.

xnorpx · 2024-10-29T14:15:27Z

I wonder if the estimate then should indicate approximate rtp header overhead per mid, so a user can simply subtract it depending on it's allocation scheme. (For audio rtp overhead is roughly 22 kbit/s for packet time 20)

Example I am running 5 audio streams outbound streams at ~30 kbit/s

Estimate comes back with 250 kbit/s

Without rtp header size information I will increase each stream to 50kbit/s and overshoot.

k0nserv · 2024-10-29T14:22:24Z

Yes something like that would be required @xnorpx. @davibe Also made a good point about NACK rates the other day.

To truly estimate how much media you can send, the math is: estimate - estimate * 2 * loss_rate or, if the estimate accounts for the packet overhead, (estimate - packet_overhead) - (estimate - packet_overhead) * 2 * loss_rate.

At the moment I don't quite see the benefit of estimating with the packet overhead, only the media estimate and loss seems sufficient.

xnorpx · 2024-10-29T14:40:40Z

To avoid underutilization, (worse quality than libwebrtc) maybe step one could just be to use rtp header in the estimate and then remove the rtp overhead in the return bwe value?

k0nserv · 2024-10-29T14:50:53Z

To avoid underutilization, (worse quality than libwebrtc) maybe step one could just be to use rtp header in the estimate and then remove the rtp overhead in the return bwe value?

Why do you think we'd end up underutilizing? I definitely think that's possible(libWebRTC's BWE is much more sophisticated than ours), but I don't see how it happens because we don't account for packet overhead.

For example, if the available bitrate is 3Mbit/s, str0m will still arrive roughly at an estimate of 3Mbit/s - average_packet_overhead. Remember str0m keeps increasing utilisation until the desired bandwidth is reached or congestion is detected. The estimate emitted by str0m isn't a true estimate of the bitrate, but an estimate for the media bitrate, but if it's used to allocate media it should be equivalent I think. Unless the packet overhead fluctuates a lot I don't think it should matter much, the overhead is essentially a constant.

Maybe I'm missing something in my thinking. @alexlapa can you talk a bit more about your use case and why you are interested in the estimate inclusive of the packet overhead?

alexlapa · 2024-11-05T10:08:22Z

Maybe I'm missing something in my thinking. @alexlapa can you talk a bit more about your use case and why you are interested in the estimate inclusive of the packet overhead?

Okay, so here is the full story :)

I've been noticing estimation drops for users that don't seem to have major bandwidth limitations. I also notice that it seems that such users have higher than usual jitter fluctuations so that may cause delay based implementation to go crazy sometimes. And another thing is that estimation does not go up, so these users just permanently stay at S0T0 which is pretty bad. Bitrate probing is enabled via set_desired_bitrate, so this part should be okay.

So the next thing I've tried is comparing libwebrtc implementation and str0m implementation like this:

|--------thats me----------|             |--problematic user--|
|            |-->tx_peer0-->-----P2P----->----> rx_peer0      |
|media src-->|             |             |                    |
|            |-->tx_peer1-->----str0m---->----> rx_peer1      |
|__________________________|             |____________________|

My libwebrtc implementation is modified to log BWE estimations at multiple points and there is a considerable difference between libwebrtc's and str0m's estimations. So this PR is supposed to make acked bitrate to be the same, but i don't think it will fix the whole problem and I'm pretty sure that there will be few other things that affect the end result.

So my goal is making sure that libwebrtc's and str0m's BWE are close as possible and the plan is adjusting str0m's implementation one step at a time, so this is supposed to be just a first PR from a series.

I'm okay with closing this PR since i just can exclude header overhead from my estimations, but this will kind make the whole thing more difficult.

There is also a possibility that #579 will fix my case, so I'l definitely check this after it is merged.

And another thing is what do you think about making it possible to provide BWE implementation to str0m via config with current implementation being default? TwccSendRecord's would probably be an input, but I'l take a look at how that works in pion which allows that.

algesten · 2024-11-05T10:14:19Z

So my goal is making sure that libwebrtc's and str0m's BWE are close as possible and the plan is adjusting str0m's implementation one step at a time, so this is supposed to be just a first PR from a series.

Very interesting! Thanks! And love the test setup!

I think coordinate a bit with @k0nserv, since we want to bring str0m up-to-date with the latest libWebRTC revisions. After Hugo has merged the loss controller, I think next thing is to get up-to-date so we are comparing like-for-like in your test setup.

k0nserv · 2024-11-05T10:26:44Z

That's excellent extra context!

So my goal is making sure that libwebrtc's and str0m's BWE are close as possible and the plan is adjusting str0m's implementation one step at a time, so this is supposed to be just a first PR from a series.

This is great goal and we'd love more contribution in the direction of making the BWE system better. While it's based on libWebRTC, it's much more limited in scope at the moment.

I'm okay with closing this PR since i just can exclude header overhead from my estimations, but this will kind make the whole thing more difficult.

Isn't this backwards? Currently you don't have to exclude header overhead because the estimate is that of the media bitrate i.e. it already excludes header overhead. If we were to land this PR, str0m users would need to exclude header overhead from the estimate when allocating media bitrates. A problem is that str0m doesn't currently surface enough information to perform this calculation i.e. we don't have headerBytesSent.

A related question is whether to include header overhead should be configurable or not?

There is also a possibility that #579 will fix my case, so I'l definitely check this after it is merged.

For now the loss based estimator will never estimate above the the delay based estimator. The benefit it will bring is that we'll react, by decreasing the estimates, when congestion manifests as loss.

And another thing is what do you think about making it possible to provide BWE implementation to str0m via config with current implementation being default? TwccSendRecord's would probably be an input, but I'l take a look at how that works in pion which allows that.

I'll leave @algesten to comment on whether this is something we'd do, but we'd much rather you and others contribute to improving the BWE system we have.

To diagnose problems with the estimates the feature _internal_dont_use_log_stats is very useful. It prints lots of extra data to standard out which can be plotted to understand what's going on. If you are interested I can share some jupyter notebook code for this.

EDIT: About the actual problems you discussed

I've been noticing estimation drops for users that don't seem to have major bandwidth limitations. I also notice that it seems that such users have higher than usual jitter fluctuations so that may cause delay based implementation to go crazy sometimes. And another thing is that estimation does not go up, so these users just permanently stay at S0T0 which is pretty bad. Bitrate probing is enabled via set_desired_bitrate, so this part should be okay.

High jitter being a problem would imply the linear fit of the delay variation is confusing the jitter for increasing delay caused by congestion. The trendline estimate(logged via log_trendline_estimate!) will be interesting to look at.

In terms of bitrate probing did you verify str0m is correctly sending padding to prove out the estimate? Also, how big is the gap between S0T0 and S1T0, we've found that having too big gaps is problematic

algesten · 2025-01-11T19:14:27Z

I believe this relates to #608. If that PR irons out the problems @alexlapa sees, we might not need to change whether we count the RTP header or not.

alexlapa · 2025-01-12T20:00:05Z

@algesten ,

we might not need to change whether we count the RTP header or not.

Yeah, I've already mentioned this:

I'm okay with closing this PR since i just can exclude header overhead from my estimations

@k0nserv ,

Isn't this backwards? Currently you don't have to exclude header overhead because the estimate is that of the media bitrate i.e. it already excludes header overhead.

What I mean is that I exclude it from my calculation of egress bitrate that I match against str0m estimations. I've included it before cause I assumed that str0m's estimation also includes it, cause libwebrtc does.

A related question is whether to include header overhead should be configurable or not?

Eh, its up to you really. I believe that it should be always included, cause that's just a correct thing to do. goog-cc works with transport bandwidth not media bitrate. And RTP and transport overhead might will be different in different cases. Not by a lot, but still.

In terms of bitrate probing did you verify str0m is correctly sending padding to prove out the estimate? Also, how big is the gap between S0T0 and S1T0, we've found that having too big gaps is problematic

Well, I'm also using S0T1 and S0T2 and the gap between S0T0 and S0T1 is quite small so that's not the case.

Anyways, #601 and #608 seem to fix the original issue I've described.

algesten · 2025-01-12T20:16:47Z

If libWebRTC count their header overhead, I don't see why we wouldn't. We don't need configuration for it.

xnorpx · 2025-01-14T14:59:43Z

Returning the full "estimated bitrate" and then a way to get a rough estimate or average rtp overhead per stream is my vote.

One could always return the full estimated bitrate and estimated bitrate - (total rtp overhead)

k0nserv · 2025-01-24T10:55:44Z

I agree with @xnorpx, we need a way to get RTP overhead. Consider: I have an encoder with a target bitrate of 500kbit/s and an estimate(including RTP overhead) of 512kbit/s, can I send this media stream without overusing? This question is impossible to answer without knowing what the RTP overhead is.

The other thing is that this change is breaking, but very subtly so. If we do it I maybe we should change the types in the public API to force this consideration on consumers

algesten · 2025-01-24T12:33:29Z

I agree with @xnorpx, we need a way to get RTP overhead. Consider: I have an encoder with a target bitrate of 500kbit/s and an estimate(including RTP overhead) of 512kbit/s, can I send this media stream without overusing? This question is impossible to answer without knowing what the RTP overhead is.

The other thing is that this change is breaking, but very subtly so. If we do it I maybe we should change the types in the public API to force this consideration on consumers

I don't think you can be that accurate when setting media bitrate to estimated bandwidth. For video codecs the bitrate you set is not a hard limit, just what you desire. Hence if the BWE gives you 512kbit/s, I would be conservative and set some offset like 512 - x in the codec and you motivate x by "overhead, not overshooting etc".

Audio is a bit different because for many codecs the rate is pretty static, but then so should the RTP header overhead be, so again, you make up some x and set that.

I think str0m should communicate the link bandwidth and not worry about media. It's up to the user how to handle the link figure wrt to codecs. That might be breaking change semantically, but I think it's the correct choice.

xnorpx · 2025-01-24T15:13:35Z

So if an agreement is that str0m should generate the link bitrate and not media, then it's more of a question how to report the rtp overhead.

The rtp header bitrate will differ between sessions depending on what rtp header extension is added and if the sending rate is changing. This will mostly impact Audio since the rtp header overhead is bigger on audio.

For example if you send 10kbit/s opus with 20ms packet time, RTP header is ~20 kbit/s and twice the size of the payload. During a session this bitrate can be changed up to 500kbit/s and packet time can vary between 10 to 120ms.

Regarding ways to get the RTP overhead for a stream/mid here is some ideas.

str0m keeps track of rtp header rate internally and emit events in case of changing.

Add API for a user to query metadata info about the last sent out packet,

last_transmit_info() which can return a rid/mid,ssrc,rtp_header_size,payload_size,type, packet type[srtp,stun etc]

This could then allow advanced users to query transmit information and create stats and metrics including calculating rtp header overhead.

algesten · 2025-01-24T15:43:18Z

Add API for a user to query metadata info about the last sent out packet,

How about adding something to MediaIngressStats and MediaEgressStats? They are already reported per mid/rid and has a total byte value.

Seems like a logical place to have it since we want averages over time.

k0nserv · 2025-01-24T15:45:29Z

How about adding something to MediaIngressStats and MediaEgressStats? They are already reported per mid/rid and has a total byte value.

Seems like a logical place to have it since we want averages over time.

I was thinking the same. We want the equivalent of RTCOutboundRtpStreamStats.headerBytesSent

Take into account RTP packets header sizes when calculating BWE and p…

e383678

…acing

alexlapa marked this pull request as ready for review October 28, 2024 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take into account RTP packets header sizes when calculating BWE and pacing #581

Take into account RTP packets header sizes when calculating BWE and pacing #581

alexlapa commented Oct 28, 2024

algesten commented Oct 29, 2024

k0nserv commented Oct 29, 2024

xnorpx commented Oct 29, 2024

k0nserv commented Oct 29, 2024 •

edited

Loading

xnorpx commented Oct 29, 2024 •

edited

Loading

k0nserv commented Oct 29, 2024

alexlapa commented Nov 5, 2024

algesten commented Nov 5, 2024

k0nserv commented Nov 5, 2024 •

edited

Loading

algesten commented Jan 11, 2025

alexlapa commented Jan 12, 2025

algesten commented Jan 12, 2025

xnorpx commented Jan 14, 2025

k0nserv commented Jan 24, 2025

algesten commented Jan 24, 2025

xnorpx commented Jan 24, 2025

algesten commented Jan 24, 2025 •

edited

Loading

k0nserv commented Jan 24, 2025 •

edited

Loading

Take into account RTP packets header sizes when calculating BWE and pacing #581

Are you sure you want to change the base?

Take into account RTP packets header sizes when calculating BWE and pacing #581

Conversation

alexlapa commented Oct 28, 2024

algesten commented Oct 29, 2024

k0nserv commented Oct 29, 2024

xnorpx commented Oct 29, 2024

k0nserv commented Oct 29, 2024 • edited Loading

xnorpx commented Oct 29, 2024 • edited Loading

k0nserv commented Oct 29, 2024

alexlapa commented Nov 5, 2024

algesten commented Nov 5, 2024

k0nserv commented Nov 5, 2024 • edited Loading

algesten commented Jan 11, 2025

alexlapa commented Jan 12, 2025

algesten commented Jan 12, 2025

xnorpx commented Jan 14, 2025

k0nserv commented Jan 24, 2025

algesten commented Jan 24, 2025

xnorpx commented Jan 24, 2025

algesten commented Jan 24, 2025 • edited Loading

k0nserv commented Jan 24, 2025 • edited Loading

k0nserv commented Oct 29, 2024 •

edited

Loading

xnorpx commented Oct 29, 2024 •

edited

Loading

k0nserv commented Nov 5, 2024 •

edited

Loading

algesten commented Jan 24, 2025 •

edited

Loading

k0nserv commented Jan 24, 2025 •

edited

Loading