- Downtime convergence for various reboot scenarios
- Overview
- Topology
- Setup configuration
- Test Methodology
- Test cases
- Test case # 1 - Downtime convergence measurement for warm-reboot while sending traffic
- Test case # 2 – Downtime Convergence measurement for fast-reboot while sending traffic.
- Test case # 3 – Downtime Convergence measurement for cold-reboot while sending traffic.
- Test case # 4 – Downtime Convergence measurement for soft-reboot while sending traffic.
- Call for action
The purpose of these tests is to measure the downtime convergence of the system or the service by evaluating various reboot scenarios in the SONiC system, closely resembling production environment.
These tests are targeted on fully functioning SONiC system. We will be measuring the downtime convergence of a particular system or a system as a whole while evaluating reboot scenarios such as warm-reboot, fast-reboot, cold-reboot and soft-reboot.
The tests will run on following testbeds:
- t0
IPv4 and IPv6 EBGP neighborship will be established on a LAG between SONiC DUT and directly connected test ports. Test ports inturn will simulate the ToR's and Leafs by advertising IPv4/IPv6, dual-stack routes.
Following test methodology will be used for measuring downtime convergence.
- Traffic generator will be used to configure ebgp peering between chassis ports and SONiC DUT on top of LAG by advertising IPv4/IPv6, dual-stack routes.
- Data traffic will be sent from server to server, server to T1 and T1 to server.
- Depending on the test case, the reboots will be generated and downtime convergence will be measured.
- Downtime convergence will be measured by noting down the precise time of the data plane below threshold timestamp and the data plane above threshold timestamp. Traffic generator will create those timestmaps and provide us with the data plane downtime convergence statistics.
- In order to measure the control plane downtime convergence, we will be pinging the DUT loopback interface to measure how long it takes to respond back to the ping once it comes back online.
- Similarly for measuring protocol downtime convergence can be neasured for concerned protocol, in this case it is BGP by polling the state of the protocol.
Measure the downtime convergence time when warm-reboot is issued while traffic is running.
- Configure IPv4 and IPv6 EBGP sessions between test port and the SONiC switch on top of a LAG.
- Advertise 4000 IPv4 and 3000 IPv6 routes through BGP.
- Configure 2000 vlan hosts per server.
- Start all protocols and verify that IPv4 BGP neighborship is established.
- Create server-server, server-T1 and T1-server data traffics and enable tracking by "Destination Endpoint" and by "Destination session description".
- Set the desired threshold value for receiving traffic. By default it will be set to 90% of expected receiving rate.
- Apply and start the data traffic.
- Verify that traffic is flowing without any loss.
- Enable csv logging or check the state of bgp protocol through API.
- Control plane convergence time is measured by pinging the loopback interface of switch.
- Now do warm-reboot by issuing the command "sudo warm-reboot".
- Verify that there is no traffic loss after the switch is back up.
- Drill down by "Destination Endpoint" under traffic statistics to get the data plance convergence time.
- In general the convergence value will fall in certain range. In order to achieve proper results, run the test multiple times and average out the test results.
- Set it back to default configuration.
Reboot Type | Event | Convergence (s) |
---|---|---|
Server-Server Traffic - IPv4 | 0 | |
Server-Server Traffic - IPv6 | 0 | |
Server-T1 Traffic - IPv4 | 0 | |
Server-T1 Traffic - IPv6 | 0 | |
Warm-reboot | T1-Server Traffic - IPv4 | 0 |
T1-Server Traffic - IPv6 | 0 | |
BGP Control plane | 135 | |
BGP+ Control plane | 135 | |
Control plane (Loopback ping) | 28 |
For above test case, below are the test results when T1 is replaced with a original switch.
Reboot Type | Event | Convergence (s) |
---|---|---|
Server-Server Traffic - IPv4 | 0 | |
Server-Server Traffic - IPv6 | 0 | |
Server-T1 Traffic - IPv4 | 0 | |
Server-T1 Traffic - IPv6 | 0 | |
Warm-reboot | T1-Server Traffic - IPv4 | 0 |
T1-Server Traffic - IPv6 | 0 | |
BGP Control plane | 137 | |
BGP+ Control plane | 137 | |
Control plane (Loopback ping) | 29 |
Measure the downtime convergence time when fast-reboot is issued while traffic is running.
- Configure IPv4 and IPv6 EBGP sessions between test port and the SONiC switch on top of a LAG.
- Advertise 4000 IPv4 and 3000 IPv6 routes through BGP.
- Configure 2000 vlan hosts per server.
- Start all protocols and verify that IPv4 BGP neighborship is established.
- Create server-server, server-T1 and T1-server data traffics and enable tracking by "Destination Endpoint" and by "Destination session description".
- Set the desired threshold value for receiving traffic. By default it will be set to 90% of expected receiving rate.
- Apply and start the data traffic.
- Verify that traffic is flowing without any loss.
- Enable csv logging or check the state of bgp protocol through API.
- Control plane convergence time is measured by pinging the loopback interface of switch.
- Now do warm-reboot by issuing the command "sudo fast-reboot".
- Verify that there is no traffic loss after the switch is back up.
- Drill down by "Destination Endpoint" under traffic statistics to get the data plance convergence time.
- In general the convergence value will fall in certain range. In order to achieve proper results, run the test multiple times and average out the test results.
- Set it back to default configuration.
Reboot Type | Event | Convergence (s) |
---|---|---|
Server-Server Traffic - IPv4 | 7 | |
Server-Server Traffic - IPv6 | 7 | |
Server-T1 Traffic - IPv4 | 16 | |
Server-T1 Traffic - IPv6 | 17 | |
Fast-reboot | T1-Server Traffic - IPv4 | 13 |
T1-Server Traffic - IPv6 | 13 | |
BGP Control plane | 84 | |
BGP+ Control plane | 84 | |
Control plane (Loopback ping) | 31 |
For above test case, below are the test results when T1 is replaced with oriignal switch.
Reboot Type | Event | Convergence (s) |
---|---|---|
Server-Server Traffic - IPv4 | 8 | |
Server-Server Traffic - IPv6 | 8 | |
Server-T1 Traffic - IPv4 | 19 | |
Server-T1 Traffic - IPv6 | 20 | |
Fast-reboot | T1-Server Traffic - IPv4 | 15 |
T1-Server Traffic - IPv6 | 15 | |
BGP Control plane | 84 | |
BGP+ Control plane | 84 | |
Control plane (Loopback ping) | 30 |
Measure the downtime convergence time when cold-reboot is issued while traffic is running.
- Configure IPv4 and IPv6 EBGP sessions between test port and the SONiC switch on top of a LAG.
- Advertise 4000 IPv4 and 3000 IPv6 routes through BGP.
- Configure 2000 vlan hosts per server.
- Start all protocols and verify that IPv4 BGP neighborship is established.
- Create server-server, server-T1 and T1-server data traffics and enable tracking by "Destination Endpoint" and by "Destination session description".
- Set the desired threshold value for receiving traffic. By default it will be set to 90% of expected receiving rate.
- Apply and start the data traffic.
- Verify that traffic is flowing without any loss.
- Enable csv logging or check the state of bgp protocol through API.
- Control plane convergence time is measured by pinging the loopback interface of switch.
- Now do warm-reboot by issuing the command "sudo coldreboot".
- Verify that there is no traffic loss after the switch is back up.
- Drill down by "Destination Endpoint" under traffic statistics to get the data plance convergence time.
- In general the convergence value will fall in certain range. In order to achieve proper results, run the test multiple times and average out the test results.
- Set it back to default configuration.
Reboot Type | Event | Convergence (s) |
---|---|---|
Server-Server Traffic - IPv4 | 73 | |
Server-Server Traffic - IPv6 | 73 | |
Server-T1 Traffic - IPv4 | 116 | |
Server-T1 Traffic - IPv6 | 120 | |
Cold-reboot | T1-Server Traffic - IPv4 | 102 |
T1-Server Traffic - IPv6 | 102 | |
BGP Control plane | 102 | |
BGP+ Control plane | 102 | |
Control plane (Loopback ping) | 40 |
For above test case, below are the test results when T1 is replaced with oriignal switch.
Reboot Type | Event | Convergence (s) |
---|---|---|
Server-Server Traffic - IPv4 | 69 | |
Server-Server Traffic - IPv6 | 69 | |
Server-T1 Traffic - IPv4 | 110 | |
Server-T1 Traffic - IPv6 | 113 | |
Cold-reboot | T1-Server Traffic - IPv4 | 96 |
T1-Server Traffic - IPv6 | 96 | |
BGP Control plane | 96 | |
BGP+ Control plane | 96 | |
Control plane (Loopback ping) | 41 |
Measure the downtime convergence time when cold-reboot is issued while traffic is running.
- Configure IPv4 and IPv6 EBGP sessions between test port and the SONiC switch on top of a LAG.
- Advertise 4000 IPv4 and 3000 IPv6 routes through BGP.
- Configure 2000 vlan hosts per server.
- Start all protocols and verify that IPv4 BGP neighborship is established.
- Create server-server, server-T1 and T1-server data traffics and enable tracking by "Destination Endpoint" and by "Destination session description".
- Set the desired threshold value for receiving traffic. By default it will be set to 90% of expected receiving rate.
- Apply and start the data traffic.
- Verify that traffic is flowing without any loss.
- Enable csv logging or check the state of bgp protocol through API.
- Control plane convergence time is measured by pinging the loopback interface of switch.
- Now do warm-reboot by issuing the command "sudo soft-reboot".
- Verify that there is no traffic loss after the switch is back up.
- Drill down by "Destination Endpoint" under traffic statistics to get the data plance convergence time.
- In general the convergence value will fall in certain range. In order to achieve proper results, run the test multiple times and average out the test results.
- Set it back to default configuration.
- Soft-reboot command is not available in cli. Once it is available will upstream the script.