-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky serial on sc7180 (Qualcomm chromebooks) #380
Comments
All of the test failures on both limozeen and kingoftown for this week have the exact same LAVA error message returned: I added a failure_retry to the test to see if that LAVA STARTRUN message could just be retried if it failed. But some of the commands throwed errors, so it seems they're not prepared to be run multiple times: https://lava.collabora.dev/scheduler/job/14242039 Some ideas that come to mind is trying the |
Here's a good run for comparison: https://lava.collabora.dev/scheduler/job/14278523 In the failed run, there's:
The trailing As for adding an initial delay, I also checked the code that sends the |
I tried using LAVA's
Then I tried sleeping for 5 seconds in So it seems this issue doesn't have anything to do with the serial output being busy from boot, and no kind of delay will help. Still not sure how this happens, given that kernel messages should be printed atomically to the serial, and why the |
I was pointed out by Doug Anderson to a series he recently sent to make the serial more reliable on qualcomm platforms, including sc7180: https://lore.kernel.org/all/[email protected]/#t The results speak for themselves:
So before the patch out of 20 runs, 18 failed due to the missing message in the serial. After the patch all 20 runs passed. It's a clear fix. I'll report the results upstream and we just have to wait for it to be merged for this issue to finally go away :). |
There's a new series on the list to fix the serial: https://lore.kernel.org/all/[email protected]/ I've tested and it works. Hopefully this one gets merged soon. Results for reference:
|
Amazing news!!! |
The series has been included into tty-next, so should land in mainline very soon. |
The patch series has been merged and indeed I no longer see incomplete jobs on sc7180 related to the serial. Closing. |
The serial on sc7180 has been known to be flaky for a while. Now that we're getting results from those platforms on the new system that we can do some data analysis on, I'm opening this issue so we can track progress on finally tracking it down and fixing the issue.
Recent example: https://lava.collabora.dev/scheduler/job/14158411
I've recently opened an issue for the serial on some Mediatek platforms: #366 . But if seems different in this case, as the issue happened on the output, not the input. I also feel like I've seen this exact STARTRUN error on sc7180 specifically too many times. If that's the only error that happens here it could be an issue specific to the first message that is printed. Need to collect more failed runs.
The text was updated successfully, but these errors were encountered: