-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow ICE connection flow compared to libwebrtc #405
Comments
Thanks for raising this. I don't think there's anything deliberately slowing things down. I think all pairs are tested at the same time. To explain what's happening. As you know, you add ice candidates, both local and remote. Local candidates are combined with remote candidates into pairs. Pairs are considers differently good, a direct host-host connection is better than going through turn servers. Once a pair is formed, we start making STUN requests with that pair as sender/receiver. If a STUN requests goes through and we receive an answer, the pair is a success. We nominate the pair as the active one. The best prio successful pair "wins". The easiest way to understand why this takes time is to turn on TRACE or add println.
Link me any code that doesn't make sense and i'll explain what it does. |
@algesten excited to work on debugging this, thanks for the info! |
I'll post updates as I test small changes and measure in production app
|
Nice finds!
Let's double check this against libWebRTC. I don't think there's a problem lowering it, but that also means more STUN packets being sent in a short time.
This could potentially be the certificate generation |
Yes, I can't say for sure this helps until I check every other variable. I'm going to start with generated DTLS cert beforehand. Thanks! One question: is it safe to use one certificate for multiple connections or I should make a pool? |
They are strictly use once, or you're opening up a security hole. Hm. I see it's |
Some things that libwebrtc does to connect fast that str0m should probably do:
|
A few more thoughts:
|
This would mean both sides effectively have the same IP address? Could that be generalised to "same IP" regardless of type of candidate?
I'm probably missing something, but… our standard use case for an SFU, is a server with a public IP and clients behind NAT, firewalls etc. Wouldn't host <> relay be the most likely then? It's quite different to peer-peer. Or taking a step back, why would removing any pairs be an advantage? Less noise?
Sure. Let's discuss possible strategies on Zulip. |
Hmm, several questions:
Directly talking from a host candidate to a relay implies relay and your node are in the same subnet. If the client can reach the relay, it should also be able to reach the node. I think what typically happens is that sending from a host candidate ends up being the same as sending from the server-reflexive candidate because your routes are configured to forward to the next router, out of your current subnet. Perhaps one rule could be: If we discover a server-reflexive candidate that has another host candidate as the base, don't bother forming pairs for the host candidates? |
Yeah I think it is safe to assume that a relay doesn't share an IP with another service so same IP should mean same relay. I am not sure generalising makes sense. Two nodes might have the same server-reflexive IP. That means they should be reachable via their host candidates. |
|
Relevant: #476 |
For anybody following along, the issue turned out to be a combination of:
With both of these fixed, I am getting similar results as in #476: str0m needs about 350ms from the changing the state to |
Let's close this. I don't think we have anything more concrete to action. |
It's not exactly an issue, but I want to start a discussion to find ways we can speed up the connection when there are multiple ICE candidates (i.e.
ice_lite
disabled). Although this benefits primarily the p2p use-case, but as previously mentioned I'd like to improve the ICE agent in str0m and we can start here. Because this is critical for our app.Context: In a p2p application we'd loop over network interfaces, add each one as
host
and then start addingsrflx
andrelay
candidates. st0rm connects instantly if thehost
candidate works. But when over a network it seems like each added candidate that doesn't work adds delay to the connection. This delay is very noticeable when ICE agent needs to go over 4-5 candidate pairs to connect.In my unscientific tests, I manually started a call via 2 libwebrtc peers and 2 str0m peers with the same STUN and TURN servers configured.
str0m
took 5x the time libwebrtc took to connect.What do you think can be the issue? Are we going over candidates sequentially?
The text was updated successfully, but these errors were encountered: