- Overview
- Input Data for swss
- SWSS state restore
- SWSS state consistency validation
- SWSS state sync up
- SWSS docker warm restart
- Open issues
The goal of SONiC swss docker warm restart is to be able restart and upgrade swss docker software without impacting the data plane.
The swss docker warm restart should restore all necessary control plane data and get synchronized with current network and switch states. As the single point of interface with lower layer syncd, the state processing of orchagent is the key for swss docker warm restart.
The state of swss is driven by data from multiple sources.
configDB and together with fixed config files port_config.ini and pg_lookup.ini form the basic data source for port, VLAN, VLAN interface, buffer, QoS, CRM, PFC watchdog and ACL. Certain data will be consumed by orchagent directly like QoS, CRM, and ACL configurations.
The initial port config is parsed by portsyncd and passed down to orchagent, that again will trigger SAI api call to ask ASIC SDK to create host netdev interface on Linux. Portsyncd watches the kernel RTNLGRP_LINK netlink notification about host interface creation, then signals PortInitDone to Orchagent via APPDB so the other processing in Orchagent may proceed.
Intfsyncd listens on RTNLGRP_IPV4_IFADDR and RTNLGRP_IPV6_IFADDR netlink groups, it sets appDB INTF_TABLE with the data.
Neighsyncd gets ARP netlink notification from Linux kernel, the data will be used by Orchagent to populate neighbor and nexthop objects.
teamsyncd runs in teamd docker, it also listens on both RTNLGRP_LINK netlink group for LAG related notification and teamd lag member state messages. The data will be used to program APPDB lag and lagmember tables.
Teamd is the ultimate source of LAG related state including those kernel lag message. Teamsyncd takes instructions from both teamd and Linux kernel.
All the APPDB ROUTE_TABLE data is from BGP docker. fpmsyncd which programs APPDB ROUTE_TABLE directly runs in BGP docker.
For copp, tunnel and mirror related configurations, they are loaded from json files to corresponding APPDB tables.
FDB and Port state notifications come from ASIC, syncd relays the data to orchagent. Orchagent also gets info for the objects created by ASIC by default, ex. the port list, hw lanes and queues.
During swss warm restart, the state of swss should be restored. It is assumed that all data in APPDB has either been restored or been kept intact.
Portsyncd has some internal state processing as to PortConfigDone and PortInitDone. We don't restore port data from APPDB directly, but rely on port config and linux RTM_GETLINK dump to populate port data again.
VLAN also needs to set up Linux host environment for vlan aware bridge. But for swss docker restart, the previous bridge setup is kept by Linux, vlanmgrd should avoid deleting and creating the bridge again.
For APP INTF, netlink.dumpRequest(RTM_GETADDR) in intfsyncd will populate interface IP from Linux.
These data will be restored from APPDB directly by orchagent.
Orchagent fetch the existing data from configDB at startup.
These configuration will be loaded to APPDB from JSON files then received by orchagent at startup.
Both the FDB and port state data is restored from APPDB by orchagent.
Orchagent relies on SAI get api to fetch the OID data from syncd for switch default objects.
After swss state restore, the state of each swss processes especially orchagent should be consistent with the state before restart. For now, it is assumed that no configDB change during the whole warm restart window. Then the state of orchagent is mainly driven by APPDB data changes. Following basic pre-restart and post-restore validation could be applied.
A "restart prepare" request is sent to orchagent, if there no pending data in SyncMap (m_toSync) of all application consumers in orchagent, OrchDaemon will set a flag to stop processing any further APPDB data change and return success for the "restart prepare" request. Otherwise failure should be returned for the request to indicate that there is un-fullfilled dependency in orchagent which is not ready to do warm restart.
The existing ProducerStateTable/ConsumerStateTable implementation should be updated so that only consumer side modify the actual table.
After swss state restore, same as that in pre-restart phase, no pending data in SyncMap (m_toSync) of all application consumers should exist. This should be done before swss state sync up.
More exhaustive validation beyond this is to be designed and implemented.
During the restart window, dynamic data like ARP, port state, FDB, LAG and route may be changed. Orchagent needs to sync up with the latest network state.
It is assumed that no configDB data change happened.
At startup of neighsyncd, netlink.dumpRequest(RTM_GETNEIGH) has been called. But there may be stale entries in APPDB NEIGH_TABLE and they should be removed.
For the warm restart of whole system, the Linux arp data is gone too. All the neighbors should be pinged again to get their mac info.
Physical state of port may change during swss restart, orchagent should fetch the latest port state or syncd should queue all the unproccessed port state notification for orchagent to pick up.
FDB entries may change during swss restart, orchagent should fetch the latest FDB entries and remove stale ones, or syncd should queue all the unproccessed FDB notification for orchagent to pick up.
Upon swss restart, teamsyncd should have queued all the LAG changes in APPDB, orchagent may process those changes after restart. In case teamd docker restarted too, it is responsible for getting LAG_TABLE and LAG_MEMBER_TABLE up to date in APPDB.
Upon swss restart, fpmsyncd should have queued all the route changes in APPDB, orchagent may process those changes after restart. In case bgp docker restarted too, it is responsible for getting ROUTE_TABLE up to date in APPDB, which includes removing stale route entries and add new route entries.
Syncd docker restart should not be triggered by swss docker restart. A systemd syncd service should be created independent of swss service.
Upon detecting the existence of certain folder, ex. /etc/sonic/warm_restart, swss docker will assume warm restart has been requested. Otherwise swss will flush db and perform cold start. A CLI and function in configDB enforcer should be implemented for enabling/disabling warm restart.
Mechanism should be available to automatically fall back to system cold restart upon failure of warm start.
dhcp_relay, snmp and radv dockers will be restart too because of their systemd dependency on swss docker when swss docker is restarted with systemctl.
How to restore all APPDB data after system restart, is there tools available for dump and restore whole redis DB?
For swss docker warm restart, keep stateDB. For system restart, since Linux host environment should be restored in this case, the stateDB needs to be flushed and populated again with Ethernet, LAG and VLAN data.