Skip to content

Commit

Permalink
HLD for changing teamd expiry timer (#1073)
Browse files Browse the repository at this point in the history
This PR adds a HLD for changing the duration of teamd's expiry timer, by sending a message to the peer device with the number of retries it should do for this LAG.
  • Loading branch information
saiarcot895 authored Sep 7, 2023
1 parent 5b6f042 commit c875c38
Showing 1 changed file with 207 additions and 0 deletions.
207 changes: 207 additions & 0 deletions doc/lag/Increasing LACP PDU timeout during warm-reboot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Increasing LACP PDU timeout during warm-reboot #

## Table of Contents

### Revision

### Scope

This high-level design document is to add a feature to teamd and define a
custom LACP PDU packet to allow changing the number of maximum retries done
before the LAG session is torn down.

### Definitions

* LACP: Link Aggregation Control Protocol
* PDU: Protocol Data Unit
* LAG: Link Aggregation Group

### Overview

During warm-reboot, the control plane can be down for a maximum of 90 seconds.
This is beacuse LACP PDUs are sent every 30 seconds, and the protocol allows for
up to 3 LACP PDUs to be missed before the LAG is considered down and data
traffic is disrupted.

It would be beneficial if it's possible to temporarily increase the timeout for
LACP PDUs on a LAG on both sides. Specifically, prior to starting warm-reboot,
the timeout could be increased by some amount (beyond the limits of the
protocol), and after warm-reboot, the timeout would be restored to the normal
value.

### Requirements

- Switch running a supported SONiC with patches in libteam for this feature on
both sides of the LAG

### Architecture Design

There's no change to the overall SONiC architecture. There are no new processes
or containers added or removed with this change.

### High-Level Design

#### Background

LACP supports two rates for sending PDUs. There is a short rate, where a PDU is
sent every 1 second, and a long rate, where a PDU is sent every 30 seconds. Both
sides know what rate to expect from the other side. If 3 LACP PDUs are missed,
then the LAG is considered to be down, and data traffic is stopped. This results
in an effective timeout of 3 seconds when using the short rate and 90 seconds
when using the long rate.

#### Protocol

To change the number of retries, a new LACP version 0xf1 will be defined. This
version will indicate that there will be two new TLV types named Actor Retry
Count (0x80) and Partner Retry Count (0x81) will be defined.

The packet structure for LACP version 0xf1 will look as follows:

| Starting byte | Length | Description | Value |
|---------------|--------|----------------------------------|-------|
| 0 | 1 | LACP Version | 0xf1 |
| 1 | 1 | Actor Info TLV Type | 0x01 |
| 2 | 1 | Actor Info TLV Length | 20 |
| 3 | 18 | Actor Info TLV Data | |
| 21 | 1 | Partner Info TLV Type | 0x02 |
| 22 | 1 | Partner Info TLV Length | 20 |
| 23 | 18 | Partner Info TLV Data | |
| 41 | 1 | Collector Info TLV Type | 0x03 |
| 42 | 1 | Collector Info TLV Length | 16 |
| 43 | 14 | Collector Info TLV Data | |
| 57 | 1 | Actor Retry Count TLV Type | 0x80 |
| 58 | 1 | Actor Retry Count TLV Length | 4 |
| 59 | 2 | Actor Retry Count TLV Data | |
| 61 | 1 | Partner Retry Count TLV Type | 0x81 |
| 62 | 1 | Partner Retry Count TLV Length | 4 |
| 63 | 2 | Partner Retry Count TLV Data | |
| 65 | 1 | Terminator TLV Type | 0x00 |
| 66 | 1 | Terminator TLV Length | 0 |
| 67 | 42 | Padding | |

Compared to the regular LACP PDU packet, the changes are as follows:
* The LACP Version field has been changed from 0x01 to 0xf1.
* Two TLVs (Actor Retry Count, and Partner Retry Count) have been added after
the Collector Info TLV.
* The padding has been reduced from 50 bytes to 42 bytes.

The Actor Retry Count and Partner Retry Count TLVs have the following content:

| Starting byte | Length | Description |
|---------------|--------|-----------------|
| 0 | 1 | Retry count |
| 1 | 1 | Padding |

If either side wants to use a non-standard retry count for a member port (i.e.
retry count set to something besides 3), then they must send a LACP version
0xf1 packet. This packet will include the retry count of both peers for that
member port. The receiving device must validate the peer's information and then
update the retry count that the peer wants to use. This retry count will apply
only to that member port, and a separate packet will need to be sent for each
member port.

This retry count is valid until any of the following occurs:

* A new retry count is sent
* A duration of 3 minutes times the retry count passes
* The LACP session goes down for whatever reason (because the new retry count
expires, because the link goes down, etc.)
* The peer device sends a version 0x01 LACP PDU (only after 60 seconds)

Except for the first event, after any of these happen, the standard retry count
of 3 applies.

In the case of the last event, where a 0x01 LACP PDU is received, the retry
count will get reset to 3 only after 60 seconds after the last 0xf1 LACP PDU
with non-standard retry count. In other words, when a 0xf1 LACP PDU is received
with a non-standard retry count, if a 0x01 LACP PDU is received within 60
seconds of that, then the retry count will not get reset to 3. This is meant to
act as a transition mechanism during image upgrades.

If both sides want to use the standard retry count of 3 instead, they are
recommended (but not required) to send a regular LACP version 0x01 packet, so
that the current standard is being followed. For SONiC's purposes, if a 0xf1
LACP PDU is received by a device, then it will also respond with a 0xf1 LACP
PDU. This will act as part of a feature presence test, to determine if the peer
device supports this feature.

#### Changing Max Retries for Warmboot

As part of a SONiC device starting the warmboot process, currently, LACP PDUs
are sent to all of the peers, to refresh the timers on the peers. This allows
the warmboot process the full 90 seconds for control plane to come back up and
for PDUs to be sent again after warmboot.

Now, the retry count on the local device will be changed to 5 retries (instead
of the standard 3 retries). This will cause teamd to send out LACP PDUs with
the above-defined version 0xf1 of the protocol, including the new retry count.
This should be done only after verifying through some method that the peer side
understands this feature. Teamd will not wait for an acknowledgment packet.

After warmboot is done, and teamd has started up after warmboot, teamd will now
be using the default standard retry count of 3. Because of this, it will send a
standard LACP PDU packet (with version 0x01). When the peer teamd client
receives this packet, it will know that this side's retry count should be
changed back to 3.

### Feature Test

To test if a neighbor device has this feature, the following checks will be
done:

* Based on the LLDP neighbor table, check to see if the remote device claims to
be a SONiC device. Specifically, check to see if the system description
contains SONiC. If desired, a version check could be made here as well. If
there is no LLDP data, or the remote device is not a SONiC device, then
assume that this feature is not support, and stop here.
* From a Python script, send a version 0xf1 LACP PDU packet, with the retry
count for both sides set to 3. If the neighbor device responds with a valid
0xf1 LACP PDU packet, then this indicates that the feature is supported. If
not, then this feature is likely not supported.

### SAI API

There are no changes needed in the SAI API or in the implementation by vendors.

### Configuration and management

#### CLI

There will be two CLIs added to get and set the retry count. These are:

* `config portchannel retry-count get <portchannel_name>`
* `config portchannel retry-count set <portchannel_name> <retry_count>`

`<portchannel_name>` must refer to a valid, existing portchannel name.
`<retry_count>` must refer to a retry count between 3 and 10.

Changes done with this CLI is NOT preserved across reboots, and not saved in
any DB.

### Restrictions/Limitations

Such a change as described in this HLD is going against the LACP protocol, and
as such, can only be supported if both sides of the LAG are running SONiC, and
they are running a version of SONiC that understands this. If the peer side is
not running a supported version of SONiC, or it is not running SONiC, then
setting a custom retry count may cause the LAG to go down.

### Testing Requirements/Design

To test this feature, a T0 topology with SONiC neighbors will be used. Test
cases will be added to get and set the retry count via CLI. In addition, a test
case will be added to increase the retry count and do a warm-reboot, and verify
that after warm-reboot, the SONiC neighbors did not bring down the LAG, and
that after the T0 comes up, the retry count has been set to 3.

# Pull requests

* [sonic-net/sonic-utilities: Add CLI configuration options for teamd retry count feature](https://github.com/sonic-net/sonic-utilities/pull/2642)
* [sonic-net/sonic-buildimage: teamd: Add support for custom retry counts for LACP sessions](https://github.com/sonic-net/sonic-buildimage/pull/13453)
* [sonic-net/sonic-mgmt: Add test cases for teamd retry count feature](https://github.com/sonic-net/sonic-mgmt/pull/8152)

# References

- [libteam](https://github.com/jpirko/libteam)
- [IEEE 802.3ad Standard for LACP](http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf)

0 comments on commit c875c38

Please sign in to comment.