-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug: workaround for sporadic failures of some tests due to unexpected data present in pexpect match #555
debug: workaround for sporadic failures of some tests due to unexpected data present in pexpect match #555
Conversation
…ed data present in pexpect match Problem was observed on UnavailableMultiTest - this test was sporadically failing. When the failure was observed the log of the failing test looked as follows: ``` File "/whatever/RISCVTests/debug/testlib.py", line 504, in <genexpr> if all(targets[hart.id]["State"] == "running" for hart in harts): ~~~~~~~~~~~~~~~~^^^^^^^^^ KeyError: 'State' ``` Adding this modification to testlib.py ``` --- a/debug/testlib.py +++ b/debug/testlib.py @@ -498,6 +498,10 @@ class Openocd: for line in lines[2:]: if line.strip(): data.append(dict(zip(headers, line.split()[1:]))) + str_data = str(data) + sys.stdout.flush() + sys.stdout.write(f"parsed targets:\n{result}\n===\n{str_data}\n---\n") + sys.stdout.flush() return data ``` Allowed me to root cause the issue. Namely we have the following situation: ``` parsed targets: Exception ignored in: <function _TemporaryFileCloser.__del__ at 0x7f2dee64d1c0> Traceback (most recent call last): File "/usr/local/lib/python3.11/tempfile.py", line 450, in __del__ self.close() File "/usr/local/lib/python3.11/tempfile.py", line 446, in close unlink(self.name) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/[email protected]' ... TargetName Type Endian TapName State -- ------------------ ---------- ------ ------------------ ------------ 0 riscv.cpu0 riscv little riscv.cpu running 1* riscv.cpu1 riscv little riscv.cpu running === [{'Exception': '"/usr/local/lib/python3.11/tempfile.py",', 'ignored': 'line', 'in:': '450,', ... ``` The only reasonable (to me) explanation for the observed behavior is below. Here is how we connect to TCL-RPC server: ``` self.openocd_cli = pexpect.spawn(f"nc localhost {self.tclrpc_port}") tty.setraw(self.openocd_cli.child_fd) ``` Later we request target list by issuing "targets" command: ``` self.command("targets") ``` Internally, pexpect.spawn implemented as follows: - we fork the process - set up pty and then call execve - all these steps are written in python "Exception ignored" messages are result of exceptions thrown from finalizers of NamedTemporaryFile objects. When exception is thrown from the finalizer - python unconditionally prints a "warning" to stderr. It seems that these messages are polluting our output from "nc" since python GC can be invoked before the execve syscall. The workaround is just to make sure that execve was executed before we rely on the format of command output. To have such a guarantee we just issue a dummy "echo" command and check that we have a proper reply in the output stream. While this explanation looks convincing, the behavior above still looks strange, given that we have https://bugs.python.org/issue14548 which was resolved long ago. However, the proposed workaround fixes the issue.
cdd91da
to
db7e451
Compare
@aap-sc, I think it would be better to use Python's socket API with socket_pexpect instead of netcat. |
Yes, it will. Though I'm not sure that it is easy to to manage socket in python layer. I'm somewhat scared of low-level socket programming. will investigate, though |
@en-sc I've investigated the situation. The thing is that socket_pexpect is available only in pexpect 4.9. This is not a stable version. I'm not sure we should use a non-stable version of packages, given that the existing solution works. I pinky-promise to keep my eyes open and transition to pexpect 4.9 once it is considered stable. |
Problem was observed on UnavailableMultiTest - this test was sporadically failing. When the failure was observed the log of the failing test looked as follows:
Adding this modification to testlib.py
Allowed me to root cause the issue. Namely we have the following situation:
The only reasonable (to me) explanation for the observed behavior is below.
Here is how we connect to TCL-RPC server:
Later we request target list by issuing "targets" command:
Internally, pexpect.spawn implemented as follows:
"Exception ignored" messages are result of exceptions thrown from finalizers of NamedTemporaryFile objects. When exception is thrown from the finalizer - python unconditionally prints a "warning" to stderr. It seems that these messages are polluting our output from "nc" since python GC can be invoked before the execve syscall.
The workaround is just to make sure that execve was executed before we rely on the format of command output. To have such a guarantee we just issue a dummy "echo" command and check that we have a proper reply in the output stream.
While this explanation looks convincing, the behavior above still looks strange, given that we have https://bugs.python.org/issue14548 which was resolved long ago.
However, the proposed workaround fixes the issue.