Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202405][route]: flush_removing_entries: EntityBulker.flush remove entries failed, number of entries to remove: 1, status: SAI_STATUS_ITEM_NOT_FOUND #3270

Open
nazariig opened this issue Aug 12, 2024 · 5 comments

Comments

@nazariig
Copy link
Collaborator

Description

The bug happens when SWSS is busy and task queue accumulates a several route configuration tasks one of which can cause a configuration failure (e.g., duplicate route).
When that happens, the rest of the good routes are being skipped from configuration, thus are missing in HW.
Further removal attempts of failed routes will lead to SAI errors like SAI_STATUS_ITEM_NOT_FOUND.

Configuration is applied:

2024 Aug  9 18:09:10.952721 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101', 'SET', (('NULL', 'NULL'),))'
2024 Aug  9 18:09:10.961216 sonic DEBUG bgp#staticroutebfd: Received message : '('PortChannel101', 'SET', (('NULL', 'NULL'),))'
2024 Aug  9 18:09:10.961216 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'SET', (('NULL', 'NULL'),))'
2024 Aug  9 18:09:10.962915 sonic DEBUG bgp#staticroutebfd: Received message : '('PortChannel101|30.1.0.1/24', 'SET', (('NULL', 'NULL'),))'
2024 Aug  9 18:09:10.962915 sonic DEBUG bgp#staticroutebfd: get ip from intf key: valid True is_ipv4 True, if_name PortChannel101 ip 30.1.0.1
2024 Aug  9 18:09:10.970778 sonic INFO swss#supervisord: orchagent
2024 Aug  9 18:09:10.971098 sonic INFO swss#supervisord: intfmgrd sysctl: cannot stat /proc/sys/net/mpls/conf/PortChannel101/input: No such file or directory
2024 Aug  9 18:09:10.972285 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101', 'SET', (('vrf', ''),))'
2024 Aug  9 18:09:10.980776 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'SET', (('state', 'ok'),))'
2024 Aug  9 18:09:11.297308 sonic NOTICE swss#swssconfig: :- main: Loading config from JSON file:/ro.json...
2024 Aug  9 18:09:12.390991 sonic NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.61@PortChannel103,10.0.0.63@PortChannel104
2024 Aug  9 18:09:12.393372 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[5048]- mlnx_remove_next_hop_group_members: Remove NEXT_HOP_GROUP_MEMBER [OID:0x490000002D] [ID:1]
2024 Aug  9 18:09:12.393774 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[5048]- mlnx_remove_next_hop_group_members: Remove NEXT_HOP_GROUP_MEMBER [OID:0x90000002D] [ID:0]
2024 Aug  9 18:09:12.398391 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[1502]- mlnx_remove_next_hop_group: Remove NEXT_HOP_GROUP [OID:0x800000005] [ID:0]
2024 Aug  9 18:09:12.418011 sonic INFO swss#supervisord: orchagent
2024 Aug  9 18:09:12.420640 sonic NOTICE swss#orchagent: :- addIp2MeRoute: Create IP2me route ip:30.1.0.1
2024 Aug  9 18:09:12.424444 sonic NOTICE syncd#SDK: [SAI_NEIGHBOR.NOTICE] ./src/mlnx_sai_neighbor.c[154]- mlnx_create_neighbor_entry: Create NEIGHBOR_ENTRY: #0 DST_MAC_ADDRESS=[54:54:00:ad:48:00]
2024 Aug  9 18:09:12.425279 sonic NOTICE syncd#SDK: [SAI_NEIGHBOR.NOTICE] ./src/mlnx_sai_neighbor.c[240]- mlnx_create_neighbor_entry: Created NEIGHBOR_ENTRY 30.1.0.2RIF:0x4100000006
2024 Aug  9 18:09:12.426282 sonic NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 30.1.0.2, 54:54:00:ad:48:00 on PortChannel101
2024 Aug  9 18:09:12.427578 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP.NOTICE] ./src/mlnx_sai_nexthop.c[1183]- mlnx_create_next_hop: Create NEXT_HOP: #0 TYPE=IP sonic-net/sonic-buildimage#1 IP=30.1.0.2 sonic-net/sonic-buildimage#2 ROUTER_INTERFACE_ID=ROUTER_INTERFACE [OID:0x4100000006] [Type:DEFAULT, ID:1]
2024 Aug  9 18:09:12.428373 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP.NOTICE] ./src/mlnx_sai_nexthop.c[1385]- mlnx_create_next_hop: Created NEXT_HOP [OID:0x900000004] [ecmp:0x9]
2024 Aug  9 18:09:12.429760 sonic NOTICE swss#orchagent: :- addNextHop: Created next hop 30.1.0.2 on PortChannel101
2024 Aug  9 18:09:12.432419 sonic NOTICE swss#orchagent: :- doTask: Get port state change notification id:1000000000029 status:1
2024 Aug  9 18:09:12.432739 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet64 oper state set from down to up
2024 Aug  9 18:09:12.434173 sonic DEBUG lldp#lldpmgrd[35]: Running command: '['lldpcli', 'configure', 'ports', 'Ethernet68', 'lldp', 'portidsubtype', 'local', 'etp18a', 'description', 'ARISTA03T1:Ethernet1']'
2024 Aug  9 18:09:12.434225 sonic DEBUG lldp#lldpmgrd[35]: Running command: '['lldpcli', 'configure', 'ports', 'Ethernet64', 'lldp', 'portidsubtype', 'local', 'etp17a', 'description', 'ARISTA01T1:Ethernet1']'
2024 Aug  9 18:09:12.436481 sonic NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status UP to host interface Ethernet64
2024 Aug  9 18:09:12.436816 sonic NOTICE swss#orchagent: :- publish: EVENT_PUBLISHED: {"sonic-events-swss:if-state":{"ifname":"Ethernet64","status":"up","timestamp":"2024-08-09T15:09:12.436579Z"}}
2024 Aug  9 18:09:12.438549 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[1929]- set_dispatch_attrib_handler: Set OPER_STATUS, key:HOSTIF [OID:0x1C0000000D] [hostif_db[28]], val:true
2024 Aug  9 18:09:12.440293 sonic NOTICE swss#orchagent: :- doTask: Ethernet64 oper speed is 50000
2024 Aug  9 18:09:12.448656 sonic ERR swss#orchagent: :- meta_sai_validate_route_entry: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.1.0.32/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} already exists
2024 Aug  9 18:09:12.449134 sonic ERR swss#orchagent: :- flush_creating_entries: EntityBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_ITEM_ALREADY_EXISTS
2024 Aug  9 18:09:12.449590 sonic ERR swss#orchagent: :- addRoutePost: Failed to create route 10.1.0.32/32 with next hop(s) 30.1.0.2@PortChannel101
2024 Aug  9 18:09:12.450748 sonic ERR swss#orchagent: :- addRoutePost: Failed to create route 30.1.0.0/24 with next hop(s) 0.0.0.0@PortChannel101

Configuration is removed:

2024 Aug  9 18:10:06.257036 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'DEL', ())'
2024 Aug  9 18:10:06.257523 sonic DEBUG bgp#staticroutebfd: Received message : '('PortChannel101|30.1.0.1/24', 'DEL', ())'
2024 Aug  9 18:10:06.257872 sonic DEBUG bgp#staticroutebfd: get ip from intf key: valid True is_ipv4 True, if_name PortChannel101 ip 30.1.0.1
2024 Aug  9 18:10:06.285108 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'DEL', ())'
2024 Aug  9 18:10:06.285492 sonic WARNING bgp#bgpcfgd: Delete key 'PortChannel101|30.1.0.1/24' is not supported for 'zebra set src' templates
2024 Aug  9 18:10:06.290529 sonic INFO systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
2024 Aug  9 18:10:06.298396 sonic INFO systemd[1]: Starting logrotate.service - Rotate log files...
2024 Aug  9 18:10:06.305718 sonic NOTICE syncd#SDK: [SAI_ROUTE.NOTICE] ./src/mlnx_sai_route.c[931]- mlnx_remove_route: Remove ROUTE_ENTRY 30.1.0.1 255.255.255.255
2024 Aug  9 18:10:06.314719 sonic NOTICE swss#orchagent: :- removeIp2MeRoute: Remove packet action trap route ip:30.1.0.1
2024 Aug  9 18:10:06.322976 sonic NOTICE bgp#fpmsyncd: message repeated 29 times: [ :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fe80::/64 :: eth0 ]
2024 Aug  9 18:10:06.322976 sonic NOTICE bgp#fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fd00::/80 :: docker0
2024 Aug  9 18:10:06.332575 sonic INFO systemd[1]: sysstat-collect.service: Deactivated successfully.
2024 Aug  9 18:10:06.332707 sonic INFO systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
2024 Aug  9 18:10:06.478214 sonic ERR swss#orchagent: :- meta_sai_validate_route_entry: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"30.1.0.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn't exist
2024 Aug  9 18:10:06.479278 sonic ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed, number of entries to remove: 1, status: SAI_STATUS_ITEM_NOT_FOUND
2024 Aug  9 18:10:06.479278 sonic ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:30.1.0.0/24

Steps to reproduce the issue:

  1. Do config reload
config reload -y -f
  1. Wait for BGP route convergency to start
2024 Aug  9 18:09:04.467477 sonic NOTICE syncd#SDK: :- threadFunction: time span 256 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug  9 18:09:04.755004 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug  9 18:09:05.467611 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug  9 18:09:05.467707 sonic NOTICE syncd#SDK: :- threadFunction: time span 51 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug  9 18:09:05.493543 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[1929]- set_dispatch_attrib_handler: Set PACKET_ACTION, key:ROUTE_ENTRY 0.0.0.0 0.0.0.0, val:FORWARD
2024 Aug  9 18:09:05.927877 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug  9 18:09:06.467724 sonic NOTICE syncd#SDK: :- threadFunction: time span 480 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug  9 18:09:06.577540 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug  9 18:09:07.467829 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug  9 18:09:07.467915 sonic NOTICE syncd#SDK: :- threadFunction: time span 241 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug  9 18:09:07.844777 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug  9 18:09:08.086475 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 128 Routes: 128 success, 0 not executed, 0 failed
2024 Aug  9 18:09:08.328286 sonic NOTICE swss#orchagent: :- addRoutePost: Update Nexthop Group 10.0.0.61@PortChannel103,10.0.0.63@PortChannel104
  1. Do config
root@sonic:/home/admin# cat config.sh
#!/bin/bash

sudo config interface  ip add PortChannel101 30.1.0.1/24
sudo ip neigh replace 30.1.0.2 lladdr 54:54:00:ad:48:00 dev PortChannel101
docker exec -ti swss swssconfig /ro.json

root@sonic:/# cat ro.json
[
    {
        "ROUTE_TABLE:10.1.0.32": {
            "ifname": "PortChannel101",
            "nexthop": "30.1.0.2"
        },
        "OP": "SET"
    }
]
  1. Do clean up
root@sonic:/home/admin# cat cleanup.sh
#!/bin/bash

sudo ip neigh del 30.1.0.2 dev PortChannel101
sudo config interface  ip remove PortChannel101 30.1.0.1/24

Describe the results you received:

2024 Aug  9 18:10:06.478214 sonic ERR swss#orchagent: :- meta_sai_validate_route_entry: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"30.1.0.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn't exist
2024 Aug  9 18:10:06.479278 sonic ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed, number of entries to remove: 1, status: SAI_STATUS_ITEM_NOT_FOUND

Describe the results you expected:

No errors are expected

Output of show version:

  • N/A

Output of show techsupport:

  • N/A

Additional information you deem important (e.g. issue happens only occasionally):

The bug is reproducible via: sonic-mgmt/tests/route/test_duplicate_route.py

@volodymyrsamotiy
Copy link
Collaborator

@prsunny, could you please help to check?

@gechiang
Copy link
Contributor

@yxieca can you help transfer this to SWSS repo? Thanks!

@lguohan lguohan transferred this issue from sonic-net/sonic-buildimage Aug 28, 2024
@lguohan
Copy link
Contributor

lguohan commented Aug 28, 2024

@dgsudharsan and @prsunny , issue transferred to sonic-swss

@prsunny
Copy link
Collaborator

prsunny commented Sep 23, 2024

AFAIK, this is as per design that if one failure happens on the bulk, the rest may get skipped. @dgsudharsan , @prabhataravind , do you have any comments? This may be an enhancement item.

@nazariig
Copy link
Collaborator Author

nazariig commented Oct 8, 2024

@prsunny can you please suggest on how we should proceed? The issue was captured by sonic-mgmt/tests/route/test_duplicate_route.py. Without a fix, this test will be flaky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants