Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible issue with overlapping RTNL dump requests #393

Open
svinota opened this issue Jan 26, 2025 · 1 comment
Open

possible issue with overlapping RTNL dump requests #393

svinota opened this issue Jan 26, 2025 · 1 comment

Comments

@svinota
Copy link
Contributor

svinota commented Jan 26, 2025

Rewriting integration tests in order to prepare the coming release of pyroute2, I found a possible issue in the lnst.Agent.InterfaceManager code:

def request_netlink_dump(self):
self._nl_socket.put(
None, RTM_GETLINK, msg_flags=NLM_F_REQUEST | NLM_F_DUMP
)
self._nl_socket.put(
None, RTM_GETADDR, msg_flags=NLM_F_REQUEST | NLM_F_DUMP
)

This method sends two dump requests in a row. I'm not sure about all kernel versions, but at least 6.12.8-200.fc41.x86_64 doesn't allow overlapping dump requests, answering -EBUSY to the second one, until the first one is read to NLMSG_DONE.

It is easy to check with this code:

# run from LNST repo root
# you should have pyroute2 in PYTHONPATH as well
#
from unittest import mock

from lnst.Agent.InterfaceManager import InterfaceManager
from pyroute2 import IPRSocket


im = InterfaceManager(mock.Mock())
im._handle_netlink_msg = mock.Mock()
im.rescan_devices()

assert isinstance(im._nl_socket, IPRSocket)

for call in im._handle_netlink_msg.call_args_list:
    # each call() == (args, kwarg) or (name, args, kwarg)
    msg = call[-2][0]
    print(
        msg.get('event'),
        msg.get(('header', 'sequence_number')),
        msg.get(('header', 'error'), ''),
    )

I get this on my test node:

~/Projects/lnst$ python test.py
RTM_NEWLINK 255
RTM_NEWLINK 255
NLMSG_ERROR 256 (16, 'Device or resource busy')
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
RTM_NEWLINK 255
NLMSG_DONE 255

As you see, all the messages are responses to seq num 255, and the only NLMSG_ERROR is a response to seq num 256. IPRSocket doesn't propagate protocol-level errors, only socket-level ones, thus the NLMSG_ERROR message here triggers no exception:

else:
return

So I believe that at least on this particular node InterfaceManager.request_netlink_dump() doesn't help to load IP address info, if it was one of the method's purposes.

I'm not sure it is a bug, or even an issue, but at least in my integration tests I can not run dumps this way, as the second doesn't work. So far I limited the test with either variant:

https://github.com/svinota/pyroute2/blob/fb8a8c4784336ddd5b003b283c968a7d60ada799/tests/test_integration/test_lnst.py#L29

Let me know, if I you have any ideas on this topic.

Thanks!

@olichtne
Copy link
Collaborator

HMMMMMMMMMM,

this is some good info, i'll try to experiment with this, I don't believe we have any "true" issue here as all of our tests regularly run this code and we don't have a problem with accessing the ip address info etc...

however that may just mean that this particular part of code does nothing and we don't have issues due to a completely different reason.

so from this context it's probably a good idea to at least debug/error log the NLMSG_ERROR messages for better visibility and likely adjust some of the "request" methods in hopes that we can achieve either better efficiency or better accuracy.

i'll experiment and let you know if i find something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants