Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RTX stops working after packet loss spike #566

Merged
merged 3 commits into from
Sep 26, 2024

Conversation

alexlapa
Copy link
Contributor

So i've noticed remote receivers sending PLIs too often even after minor packet loss. Debugging NACKs and RTXs showed that everything actually works fine for some time and at some point receivers start flooding SFU with NACKs for that same packets again and again and no RTXs are being sent at all which results in receiver ending up with a PLI request.

So whats going on is:

  1. At some point packet loss spikes and RTX ratio exceeds 15%.
  2. This results in poll_packet_resend always returning None because of this
  3. Since no resends are made it never get to this point, so RTX ratio is not being recalculated here.
  4. At this point it is basically stuck with RTX ratio locked at > 0.15 with the only option of it being fixed is if bytes transmitted sum becoming high enough which does not normally happen.

So i believe that ratio recalculation is supposed to happen during the ValueHistory::sum() call.

And here is before and after videos of how that works:

Before
before.webm
After
after.webm

On the after video tx sum drop is when SVC kicks in so ignore that.

Another thing is it would be great to be able to configure that 0.15 RTX cache drop ratio, mind if i do it in another PR?Probably as an additional argument to StreamTx::set_rtx_cache, so it will be fn set_rtx_cache(&mut self, max_packets: usize, max_age: Duration, rtx_cache_drop_ratio: Option<f32>) with None completely disabling this mechanic.

@alexlapa alexlapa marked this pull request as ready for review September 26, 2024 02:42
@algesten
Copy link
Owner

@alexlapa great find! Super nice improvement!

For sanity, I think we should change the impl a bit differently. It seems intuitive that the ValueHistory would be pruned on push, which already is a &mut and that's where I'd expect it to happen. However, as you shown, that's not a good idea.

Having it prune on sum() seems a little less intuitive. You could maybe argue that you want "a sum up until this point", but to me "calculating a sum" does not sound like a mutation.

I would prefer you drop the pruning from both push and sum, and expose drain, to make it explicit at which point this happens.

I'd also rename and change the signature of drain (since that has a specific meaning in rust std api). Something like purge or purge_old. It should also not return Option<()> since that value is of no consequence to the caller, it's simply there to allow one use of ?.

Copy link
Owner

@algesten algesten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

src/util/value_history.rs Show resolved Hide resolved
@alexlapa
Copy link
Contributor Author

@algesten ,

And what your opinion on this?

Another thing is it would be great to be able to configure that 0.15 RTX cache drop ratio, mind if i do it in another PR? Probably as an additional argument to StreamTx::set_rtx_cache, so it will be fn set_rtx_cache(&mut self, max_packets: usize, max_age: Duration, rtx_cache_drop_ratio: Option) with None completely disabling this mechanic.

@algesten
Copy link
Owner

@alexlapa i like it. There's some overlap with functions that can disable RTX depending on the SDP negotiation, but I think it's worth a PR.

@algesten algesten merged commit ed9b9eb into algesten:main Sep 26, 2024
22 checks passed
@alexlapa alexlapa deleted the fix-rtx-history-stale-values branch September 26, 2024 14:43
@alexlapa alexlapa mentioned this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants