HLD for changing teamd expiry timer (#1073)

This PR adds a HLD for changing the duration of teamd's expiry timer, by sending a message to the peer device with the number of retries it should do for this LAG.
sonic-net · Sep 7, 2023 · c875c38 · c875c38
1 parent 5b6f042
commit c875c38
Showing 1 changed file with 207 additions and 0 deletions.
diff --git a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md
@@ -0,0 +1,207 @@
+# Increasing LACP PDU timeout during warm-reboot #
+
+## Table of Contents
+
+### Revision
+
+### Scope
+
+This high-level design document is to add a feature to teamd and define a
+custom LACP PDU packet to allow changing the number of maximum retries done
+before the LAG session is torn down.
+
+### Definitions
+
+* LACP: Link Aggregation Control Protocol
+* PDU: Protocol Data Unit
+* LAG: Link Aggregation Group
+
+### Overview
+
+During warm-reboot, the control plane can be down for a maximum of 90 seconds.
+This is beacuse LACP PDUs are sent every 30 seconds, and the protocol allows for
+up to 3 LACP PDUs to be missed before the LAG is considered down and data
+traffic is disrupted.
+
+It would be beneficial if it's possible to temporarily increase the timeout for
+LACP PDUs on a LAG on both sides. Specifically, prior to starting warm-reboot,
+the timeout could be increased by some amount (beyond the limits of the
+protocol), and after warm-reboot, the timeout would be restored to the normal
+value.
+
+### Requirements
+
+- Switch running a supported SONiC with patches in libteam for this feature on
+  both sides of the LAG
+
+### Architecture Design
+
+There's no change to the overall SONiC architecture. There are no new processes
+or containers added or removed with this change.
+
+### High-Level Design
+
+#### Background
+
+LACP supports two rates for sending PDUs. There is a short rate, where a PDU is
+sent every 1 second, and a long rate, where a PDU is sent every 30 seconds. Both
+sides know what rate to expect from the other side. If 3 LACP PDUs are missed,
+then the LAG is considered to be down, and data traffic is stopped. This results
+in an effective timeout of 3 seconds when using the short rate and 90 seconds
+when using the long rate.
+
+#### Protocol
+
+To change the number of retries, a new LACP version 0xf1 will be defined. This
+version will indicate that there will be two new TLV types named Actor Retry
+Count (0x80) and Partner Retry Count (0x81) will be defined.
+
+The packet structure for LACP version 0xf1 will look as follows:
+
+| Starting byte | Length | Description                      | Value |
+|---------------|--------|----------------------------------|-------|
+|      0        |   1    | LACP Version                     | 0xf1  |
+|      1        |   1    | Actor Info TLV Type              | 0x01  |
+|      2        |   1    | Actor Info TLV Length            |  20   |
+|      3        |   18   | Actor Info TLV Data              |       |
+|      21       |   1    | Partner Info TLV Type            | 0x02  |
+|      22       |   1    | Partner Info TLV Length          |  20   |
+|      23       |   18   | Partner Info TLV Data            |       |
+|      41       |   1    | Collector Info TLV Type          | 0x03  |
+|      42       |   1    | Collector Info TLV Length        |  16   |
+|      43       |   14   | Collector Info TLV Data          |       |
+|      57       |   1    | Actor Retry Count TLV Type       | 0x80  |
+|      58       |   1    | Actor Retry Count TLV Length     |   4   |
+|      59       |   2    | Actor Retry Count TLV Data       |       |
+|      61       |   1    | Partner Retry Count TLV Type     | 0x81  |
+|      62       |   1    | Partner Retry Count TLV Length   |   4   |
+|      63       |   2    | Partner Retry Count TLV Data     |       |
+|      65       |   1    | Terminator TLV Type              | 0x00  |
+|      66       |   1    | Terminator TLV Length            |   0   |
+|      67       |   42   | Padding                          |       |
+
+Compared to the regular LACP PDU packet, the changes are as follows:
+* The LACP Version field has been changed from 0x01 to 0xf1.
+* Two TLVs (Actor Retry Count, and Partner Retry Count) have been added after
+  the Collector Info TLV.
+* The padding has been reduced from 50 bytes to 42 bytes.
+
+The Actor Retry Count and Partner Retry Count TLVs have the following content:
+
+| Starting byte | Length | Description     |
+|---------------|--------|-----------------|
+|      0        |   1    | Retry count     |
+|      1        |   1    | Padding         |
+
+If either side wants to use a non-standard retry count for a member port (i.e.
+retry count set to something besides 3), then they must send a LACP version
+0xf1 packet. This packet will include the retry count of both peers for that
+member port. The receiving device must validate the peer's information and then
+update the retry count that the peer wants to use. This retry count will apply
+only to that member port, and a separate packet will need to be sent for each
+member port.
+
+This retry count is valid until any of the following occurs:
+
+* A new retry count is sent
+* A duration of 3 minutes times the retry count passes
+* The LACP session goes down for whatever reason (because the new retry count
+  expires, because the link goes down, etc.)
+* The peer device sends a version 0x01 LACP PDU (only after 60 seconds)
+
+Except for the first event, after any of these happen, the standard retry count
+of 3 applies.
+
+In the case of the last event, where a 0x01 LACP PDU is received, the retry
+count will get reset to 3 only after 60 seconds after the last 0xf1 LACP PDU
+with non-standard retry count. In other words, when a 0xf1 LACP PDU is received
+with a non-standard retry count, if a 0x01 LACP PDU is received within 60
+seconds of that, then the retry count will not get reset to 3. This is meant to
+act as a transition mechanism during image upgrades.
+
+If both sides want to use the standard retry count of 3 instead, they are
+recommended (but not required) to send a regular LACP version 0x01 packet, so
+that the current standard is being followed. For SONiC's purposes, if a 0xf1
+LACP PDU is received by a device, then it will also respond with a 0xf1 LACP
+PDU. This will act as part of a feature presence test, to determine if the peer
+device supports this feature.
+
+#### Changing Max Retries for Warmboot
+
+As part of a SONiC device starting the warmboot process, currently, LACP PDUs
+are sent to all of the peers, to refresh the timers on the peers. This allows
+the warmboot process the full 90 seconds for control plane to come back up and
+for PDUs to be sent again after warmboot.
+
+Now, the retry count on the local device will be changed to 5 retries (instead
+of the standard 3 retries). This will cause teamd to send out LACP PDUs with
+the above-defined version 0xf1 of the protocol, including the new retry count.
+This should be done only after verifying through some method that the peer side
+understands this feature. Teamd will not wait for an acknowledgment packet.
+
+After warmboot is done, and teamd has started up after warmboot, teamd will now
+be using the default standard retry count of 3. Because of this, it will send a
+standard LACP PDU packet (with version 0x01). When the peer teamd client
+receives this packet, it will know that this side's retry count should be
+changed back to 3.
+
+### Feature Test
+
+To test if a neighbor device has this feature, the following checks will be
+done:
+
+* Based on the LLDP neighbor table, check to see if the remote device claims to
+  be a SONiC device. Specifically, check to see if the system description
+  contains SONiC. If desired, a version check could be made here as well. If
+  there is no LLDP data, or the remote device is not a SONiC device, then
+  assume that this feature is not support, and stop here.
+* From a Python script, send a version 0xf1 LACP PDU packet, with the retry
+  count for both sides set to 3. If the neighbor device responds with a valid
+  0xf1 LACP PDU packet, then this indicates that the feature is supported. If
+  not, then this feature is likely not supported.
+
+### SAI API
+
+There are no changes needed in the SAI API or in the implementation by vendors.
+
+### Configuration and management
+
+#### CLI
+
+There will be two CLIs added to get and set the retry count. These are:
+
+* `config portchannel retry-count get <portchannel_name>`
+* `config portchannel retry-count set <portchannel_name> <retry_count>`
+
+`<portchannel_name>` must refer to a valid, existing portchannel name.
+`<retry_count>` must refer to a retry count between 3 and 10.
+
+Changes done with this CLI is NOT preserved across reboots, and not saved in
+any DB.
+
+### Restrictions/Limitations
+
+Such a change as described in this HLD is going against the LACP protocol, and
+as such, can only be supported if both sides of the LAG are running SONiC, and
+they are running a version of SONiC that understands this. If the peer side is
+not running a supported version of SONiC, or it is not running SONiC, then
+setting a custom retry count may cause the LAG to go down.
+
+### Testing Requirements/Design
+
+To test this feature, a T0 topology with SONiC neighbors will be used.  Test
+cases will be added to get and set the retry count via CLI. In addition, a test
+case will be added to increase the retry count and do a warm-reboot, and verify
+that after warm-reboot, the SONiC neighbors did not bring down the LAG, and
+that after the T0 comes up, the retry count has been set to 3.
+
+# Pull requests
+
+* [sonic-net/sonic-utilities: Add CLI configuration options for teamd retry count feature](https://github.com/sonic-net/sonic-utilities/pull/2642)
+* [sonic-net/sonic-buildimage: teamd: Add support for custom retry counts for LACP sessions](https://github.com/sonic-net/sonic-buildimage/pull/13453)
+* [sonic-net/sonic-mgmt: Add test cases for teamd retry count feature](https://github.com/sonic-net/sonic-mgmt/pull/8152)
+
+# References
+
+- [libteam](https://github.com/jpirko/libteam)
+- [IEEE 802.3ad Standard for LACP](http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf)