Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing docs or examples for how to use ICE restart #416

Open
morajabi opened this issue Dec 12, 2023 · 5 comments
Open

Missing docs or examples for how to use ICE restart #416

morajabi opened this issue Dec 12, 2023 · 5 comments

Comments

@morajabi
Copy link
Contributor

morajabi commented Dec 12, 2023

Given that there's no failed/closed ICE state, how should we use ICE restart?

https://github.com/algesten/str0m/blob/main/src/ice/agent.rs#L109

    /// Connection failed. This is a less stringent test than `failed` and may trigger
    /// intermittently and resolve just as spontaneously on less reliable networks,
    /// or during temporary disconnections. When the problem resolves, the connection
    /// may return to the connected state.
    Disconnected,
    //
    // NB: The failed and closed state doesn't really have a mapping in this implementation.
    //     We never end trickle ice and it's always possible to "come back" if more remote
    //     candidates are added.
    //
    // The ICE candidate has checked all candidates pairs against one another and has
    // failed to find compatible matches.
    // Failed,
    // The ICE agent has shut down and is no longer handling requests.
    // Closed,
@k0nserv
Copy link
Collaborator

k0nserv commented Dec 12, 2023

Quoting from the specification:

Performing an ICE restart is recommended when iceConnectionState transitions to "failed". An application may additionally choose to listen for the iceConnectionState transition to "disconnected" and then use other sources of information (such as using getStats to measure if the number of bytes sent or received over the next couple of seconds increases) to determine whether an ICE restart is advisable.

For example, if a peer transitions from WiFi to cellular their selected candidate will no longer be usable and no media data will reach the other peer. This causes a disconnect event and the lack of media data can be used as a signal to trigger an ICE restart which will gather new candidates for the cellular connection.

However, a disconnect event can also occur due to loss, because the STUN binding requests or responses can be lost, in this case if media is still flowing the ICE connection state should recover.

@algesten algesten changed the title When should trigger ICE restart given that there's no failed/closed ICE state Missing docs or examples for how to use ICE restart Dec 12, 2023
@xnorpx
Copy link
Collaborator

xnorpx commented Dec 12, 2023

I do think we are missing a timeout, in the case when signaling is setup but for some reason the stun packets are not reaching the str0m instance. Str0m will sit there and wait forever and it should probably have a connection timeout.

@algesten
Copy link
Owner

algesten commented Dec 12, 2023

I do think we are missing a timeout, in the case when signaling is setup but for some reason the stun packets are not reaching the str0m instance. Str0m will sit there and wait forever and it should probably have a connection timeout.

I don't agree. Str0m has the Disconnected state, which is enough given that we don't have a "end of trickle ice candidates".

The ICE spec says that for "checklist" (we only have one such in str0m):

https://www.rfc-editor.org/rfc/rfc8838.html#name-performing-connectivity-che

regular ICE agents would set the state of a checklist to Failed if both of the following two conditions are satisfied:

  • all of the pairs in the checklist are in either the Failed state or the Succeeded state; and
  • there is not a pair in the valid list for each component of the data stream.

Trickle ICE also adds the conditions:

  • all candidate gathering has completed, and the agent is not expecting to discover any new local candidates;
  • and the remote agent has conveyed an end-of-candidates indication

To put that together:

  • Scenario 1: Without trickle ICE, fail when all pairs on checklist fails.
  • Scenario 2: With trickle ICE, fail when all pairs on checklist fails and we got a notification there are no more trickling ICE candidates.

Scenario 1 isn't relevant for str0m because we don't have an API for enumerating and providing all ICE candidates before we start the Rtc instance. Why would we have that API? Add ICE candidates whenever you want!

Ergo we only support Scenario 2: Only trickle ICE, even if we don't expect to do any trickling. The remaining question is "what about indicating end-of-candidates"?

The stance I have taken here is "why?"

  1. You can always come back from "Disconnected" state, by providing more candidates.
  2. Trickle ICE is providing more candidates.

What good does it do to notify str0m that there will never be more candidates? As far as I can see, the only reason is to encode some timeout for going into a "Failed" state. It appears to be a state/code complication only to fulfill the spec. A user of str0m can simply have a timer for the Disconnected state and decide themselves whether to expect more candidates or remove the connection – why encode that logic in str0m?

@pthatcher
Copy link
Collaborator

I agree with the latest comment. As someone who worked on defining WebRTC's "failed" state, I've always thought of it as mostly useless, especially when trickle ICE is used (as it should be). The rule for ICE restarts is pretty simple: if you have been disconnected more than X seconds, try an ICE restart. You pick the X.

@xnorpx
Copy link
Collaborator

xnorpx commented Jan 24, 2024

I do think we are missing a timeout, in the case when signaling is setup but for some reason the stun packets are not reaching the str0m instance. Str0m will sit there and wait forever and it should probably have a connection timeout.

I don't agree. Str0m has the Disconnected state, which is enough given that we don't have a "end of trickle ice candidates".

The ICE spec says that for "checklist" (we only have one such in str0m):

https://www.rfc-editor.org/rfc/rfc8838.html#name-performing-connectivity-che

regular ICE agents would set the state of a checklist to Failed if both of the following two conditions are satisfied:

  • all of the pairs in the checklist are in either the Failed state or the Succeeded state; and
  • there is not a pair in the valid list for each component of the data stream.

Trickle ICE also adds the conditions:

  • all candidate gathering has completed, and the agent is not expecting to discover any new local candidates;
  • and the remote agent has conveyed an end-of-candidates indication

To put that together:

  • Scenario 1: Without trickle ICE, fail when all pairs on checklist fails.
  • Scenario 2: With trickle ICE, fail when all pairs on checklist fails and we got a notification there are no more trickling ICE candidates.

Scenario 1 isn't relevant for str0m because we don't have an API for enumerating and providing all ICE candidates before we start the Rtc instance. Why would we have that API? Add ICE candidates whenever you want!

Ergo we only support Scenario 2: Only trickle ICE, even if we don't expect to do any trickling. The remaining question is "what about indicating end-of-candidates"?

The stance I have taken here is "why?"

  1. You can always come back from "Disconnected" state, by providing more candidates.
  2. Trickle ICE is providing more candidates.

What good does it do to notify str0m that there will never be more candidates? As far as I can see, the only reason is to encode some timeout for going into a "Failed" state. It appears to be a state/code complication only to fulfill the spec. A user of str0m can simply have a timer for the Disconnected state and decide themselves whether to expect more candidates or remove the connection – why encode that logic in str0m?

Ok I added the 5 lines of timeout handling code in our server and I agree :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants