Fix RTX stops working after packet loss spike #566

alexlapa · 2024-09-26T01:42:05Z

So i've noticed remote receivers sending PLIs too often even after minor packet loss. Debugging NACKs and RTXs showed that everything actually works fine for some time and at some point receivers start flooding SFU with NACKs for that same packets again and again and no RTXs are being sent at all which results in receiver ending up with a PLI request.

So whats going on is:

At some point packet loss spikes and RTX ratio exceeds 15%.
This results in poll_packet_resend always returning None because of this
Since no resends are made it never get to this point, so RTX ratio is not being recalculated here.
At this point it is basically stuck with RTX ratio locked at > 0.15 with the only option of it being fixed is if bytes transmitted sum becoming high enough which does not normally happen.

So i believe that ratio recalculation is supposed to happen during the ValueHistory::sum() call.

And here is before and after videos of how that works:

Before

before.webm

After

after.webm

On the after video tx sum drop is when SVC kicks in so ignore that.

Another thing is it would be great to be able to configure that 0.15 RTX cache drop ratio, mind if i do it in another PR?Probably as an additional argument to StreamTx::set_rtx_cache, so it will be fn set_rtx_cache(&mut self, max_packets: usize, max_age: Duration, rtx_cache_drop_ratio: Option<f32>) with None completely disabling this mechanic.

algesten · 2024-09-26T07:13:33Z

@alexlapa great find! Super nice improvement!

For sanity, I think we should change the impl a bit differently. It seems intuitive that the ValueHistory would be pruned on push, which already is a &mut and that's where I'd expect it to happen. However, as you shown, that's not a good idea.

Having it prune on sum() seems a little less intuitive. You could maybe argue that you want "a sum up until this point", but to me "calculating a sum" does not sound like a mutation.

I would prefer you drop the pruning from both push and sum, and expose drain, to make it explicit at which point this happens.

I'd also rename and change the signature of drain (since that has a specific meaning in rust std api). Something like purge or purge_old. It should also not return Option<()> since that value is of no consequence to the caller, it's simply there to allow one use of ?.

algesten

See comment

src/util/value_history.rs

alexlapa · 2024-09-26T14:41:14Z

@algesten ,

And what your opinion on this?

Another thing is it would be great to be able to configure that 0.15 RTX cache drop ratio, mind if i do it in another PR? Probably as an additional argument to StreamTx::set_rtx_cache, so it will be fn set_rtx_cache(&mut self, max_packets: usize, max_age: Duration, rtx_cache_drop_ratio: Option) with None completely disabling this mechanic.

algesten · 2024-09-26T14:42:16Z

@alexlapa i like it. There's some overlap with functions that can disable RTX depending on the SDP negotiation, but I think it's worth a PR.

alexlapa added 2 commits September 25, 2024 14:46

Fix RTX history not updating resulting in stale RTX ratio

46a96e3

fix unit tests

21c59b9

alexlapa marked this pull request as ready for review September 26, 2024 02:42

davibe approved these changes Sep 26, 2024

View reviewed changes

algesten requested changes Sep 26, 2024

View reviewed changes

after review corrections

ff846df

algesten approved these changes Sep 26, 2024

View reviewed changes

src/util/value_history.rs Show resolved Hide resolved

algesten merged commit ed9b9eb into algesten:main Sep 26, 2024
22 checks passed

alexlapa deleted the fix-rtx-history-stale-values branch September 26, 2024 14:43

alexlapa mentioned this pull request Oct 7, 2024

Configure RTX ratio cap #570

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RTX stops working after packet loss spike #566

Fix RTX stops working after packet loss spike #566

alexlapa commented Sep 26, 2024

algesten commented Sep 26, 2024

algesten left a comment

alexlapa commented Sep 26, 2024

algesten commented Sep 26, 2024

Fix RTX stops working after packet loss spike #566

Fix RTX stops working after packet loss spike #566

Conversation

alexlapa commented Sep 26, 2024

algesten commented Sep 26, 2024

algesten left a comment

Choose a reason for hiding this comment

alexlapa commented Sep 26, 2024

algesten commented Sep 26, 2024