Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure RTX ratio cap #570

Merged
merged 5 commits into from
Oct 15, 2024

Conversation

alexlapa
Copy link
Contributor

@alexlapa alexlapa commented Oct 7, 2024

#566 (comment)

Another thing is it would be great to be able to configure that 0.15 RTX cache drop ratio, mind if i do it in another PR? Probably as an additional argument to StreamTx::set_rtx_cache, so it will be fn set_rtx_cache(&mut self, max_packets: usize, max_age: Duration, rtx_cache_drop_ratio: Option<f32>) with None completely disabling this mechanic.

@lolgesten
Copy link

When would we want this to be None?

@alexlapa
Copy link
Contributor Author

alexlapa commented Oct 7, 2024

You mean why not just use value bigger than 1.0? It's just seems to be a more obvious off switch. I dont mind changing this if you want.

Or do you mean why would anyone want to disable this functionality? Well, i consider disabling it since its up to receiver to decide whether it need some specific packet or not, however I'm still testing this.

It is not expected that a receiver will send a NACK for every lost RTP packet; rather, it needs to consider the cost of sending NACK feedback and the importance of the lost packet to make an informed decision on whether it is worth telling the sender about a packet-loss event.

@alexlapa alexlapa marked this pull request as ready for review October 7, 2024 12:49
@algesten
Copy link
Owner

algesten commented Oct 7, 2024

@davibe can provide more context, but I think this was put into place for really bad connections, where we risk crowding the network with resends rather than real packets. I think there's definitely a case where the client keeps nacking so much that it makes no sense to just try and fulfill that.

@alexlapa
Copy link
Contributor Author

alexlapa commented Oct 8, 2024

I think there's definitely a case where the client keeps nacking so much that it makes no sense to just try and fulfill that.

Sure, i can image a situation when receiver has a good uplink so NACKs make it to the server but bad downlink so both TXs and RTXs are getting lost, sure. But also there are cases when receiver can stretch jitter buffer to a few seconds in none delay sensitive scenarios.

And i don't really understand why would you want to forbid this option. If this PR is allowing to configure this cap then why not. Do you want to assert that its in, lets say, [0.15..0.5] range?

@davibe
Copy link
Collaborator

davibe commented Oct 8, 2024

Packets can be lost due to bandwidth constraints (or be or too late, which is the same). Retransmitting "too much" uses additional bandwidth which further exceeds the connection capabilities causing even more loss. This can happen also in scenarios that are not delay-sensitive because bandwidth may become not-enough indefinitely.

In our internal products - without this arbitrary 0.15 limit - we found that varying network conditions would get the peer stuck in a loop where str0m would continuously flood them with retransmissions and big keyframes while they kept nacking and sending PLIs as they were unable to recover.

Maybe what we need is to evolve this 0.15 limit into something that is more elaborate and takes more things into account (what is the bw of the peer, how much are we overusing, are we about to send a keyframe, ...). In absence of a more elaborate logic, configurability may be useful to some.

I thought 15% was quite a high limit, but public wifis may surprise. If a network has more than 15% loss but can still sustain high speeds, then 15% is not the best option. This is a dynamic per-peer consideration. This PR seems to allow to change the limit dynamically via direct api but it does not handle the change smoothly (the rtx cache is dropped on reconfiguration). Maybe it's enough for now.

src/streams/send.rs Outdated Show resolved Hide resolved
@alexlapa
Copy link
Contributor Author

alexlapa commented Oct 8, 2024

In our internal products - without this arbitrary 0.15 limit - we found that varying network conditions would get the peer stuck in a loop where str0m would continuously flood them with retransmissions and big keyframes while they kept nacking and sending PLIs as they were unable to recover.

Yeah that looks like bandwidth issue. Mobile networks maybe? My issue with 0.15 cap is actually normal bandwidth clients that still have some sporadic bursts of ~80-180 packets lost for a single RTP stream. And increasing the cap makes it so there is no PLI and just a minor ~100ms freeze. And for the low bandwidth there is SVC, which does its job pretty good.

I thought 15% was quite a high limit, but public wifis may surprise. If a network has more than 15% loss but can still sustain high speeds, then 15% is not the best option.

Well it depends on the time frame. From what i see, average packet loss might stay under 1% but the maximum over 1 second might exceed 50% sometimes.

UPD:

I guess that in the specific case that you have described it would make sense to clear not only resends queue but also RtxCache, if you really want to completely reset receiver leaving him with no other option but requesting a keyframe.

@alexlapa alexlapa requested a review from davibe October 8, 2024 12:37
@alexlapa
Copy link
Contributor Author

@davibe ,

Is there still any issues with this PR?

@algesten
Copy link
Owner

Thanks for the reminder. I'll
do a final review later today.

@algesten algesten merged commit 7170c3a into algesten:main Oct 15, 2024
22 checks passed
@algesten
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants