Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: wait for fw loaded before starting any test #1033

Merged
merged 1 commit into from
May 10, 2023

Conversation

fredoh9
Copy link
Collaborator

@fredoh9 fredoh9 commented May 4, 2023

This is going to check whether the sof-test can be started after boot/reboot.

If FW LOADING is not a prerequisite for the test, this can be skipped, by defining NO_POLL_FW_LOADING=true

Another benefit with this change, if FW is not loaded it will be failed instead of TIMEOUT.

@fredoh9 fredoh9 requested a review from a team as a code owner May 4, 2023 22:06
@fredoh9
Copy link
Collaborator Author

fredoh9 commented May 4, 2023

typo in title, staring -> starting.
will fix when next push is required

@fredoh9 fredoh9 changed the title lib: wait for fw loaded before staring any test lib: wait for fw loaded before starting any test May 4, 2023
# Set MAX Polling time to check FW Loading. If FW is already loaded, it will
# return immediately. Default value is set to 60 seconds, because i915 driver
# timeout is 60 seconds.
MAX_WAIT_FW_LOADING=${MAX_WAIT_FW_LOADING:-60}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i915 driver is 60 seconds, shouldn't this be 65 seconds?

Also: can you please add at least one reference for more context? Maybe some of these:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i915 driver is 60 seconds, shouldn't this be 65 seconds?

Good point, I think about this too. But there are some other delays already applied top of this. so it should have plenty of room

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are some other delays already applied top of this

Once we merge this then we should get rid of those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: this is only a maximum. The test will start as soon as the firmware has booted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I will update default value to 70. if you plan to remove all extra delays, I need some extra margin

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart should we add a counter and print a warning when waiting more than 10s?

Note we already discussed this in test-specific #862. There we set a 10s maximum 1 year ago and it has never spuriously failed.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments below @fredoh9, nice work indeed.

# Set MAX Polling time to check FW Loading. If FW is already loaded, it will
# return immediately. Default value is set to 60 seconds, because i915 driver
# timeout is 60 seconds.
MAX_WAIT_FW_LOADING=${MAX_WAIT_FW_LOADING:-60}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the relationship between DSP firmware load and i915? is this that we first request the i915 modules before downloading the firmware?

and if yes, should we use the same routine as for the kmod remove/modprobe test where we have an iteration that stops on BOOT_COMPLETE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if yes, should we use the same routine as for the kmod remove/modprobe test where we have an iteration that stops on BOOT_COMPLETE.

That's exactly the suggestion I made in internal bug 387 and what @fredoh9 is doing here: he's re-using the functions I wrote in #862

case-lib/lib.sh Outdated Show resolved Hide resolved
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart should we add a counter and print a warning when waiting more than 10s?

I think you missed this question

# Set MAX Polling time to check FW Loading. If FW is already loaded, it will
# return immediately. Default value is set to 60 seconds, because i915 driver
# timeout is 60 seconds.
MAX_WAIT_FW_LOADING=${MAX_WAIT_FW_LOADING:-60}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are some other delays already applied top of this

Once we merge this then we should get rid of those.

# Set MAX Polling time to check FW Loading. If FW is already loaded, it will
# return immediately. Default value is set to 60 seconds, because i915 driver
# timeout is 60 seconds.
MAX_WAIT_FW_LOADING=${MAX_WAIT_FW_LOADING:-60}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if yes, should we use the same routine as for the kmod remove/modprobe test where we have an iteration that stops on BOOT_COMPLETE.

That's exactly the suggestion I made in internal bug 387 and what @fredoh9 is doing here: he's re-using the functions I wrote in #862

@marc-hb marc-hb requested review from andyross, lyakh, ranj063, paulstelian97, LaurentiuM1234 and kv2019i and removed request for greg-intel May 4, 2023 23:14
@marc-hb
Copy link
Collaborator

marc-hb commented May 4, 2023

Another benefit with this change, if FW is not loaded it will be failed instead of TIMEOUT.

Because it's the "last chance" TIMEOUT in our CI it's very long, how long Fred?.

Moreover we power-cycle the device after such a CI TIMEOUT. When combined with another unrelated issue #979, the whole test run ends up in a very time-consuming tail spin on that device.

@fredoh9
Copy link
Collaborator Author

fredoh9 commented May 5, 2023

Because it's the "last chance" TIMEOUT in our CI it's very long, how long Fred?.

TIMEOUT value is dependent on each test case. we used to set 2-4 times higher value of normal runtime. For an example, check-suspend-resume-50.sh has 6000 second. TIMEOUT has become an expensive way to waste resources.

This is going to check whether the sof-test can be started
after boot/reboot.

If FW LOADING is not a prerequisite for the test, this can be skipped,
by defining NO_POLL_FW_LOADING=true

Another benefit with this change, if FW is not loaded it will be failed
instead of TIMEOUT.

Signed-off-by: Fred Oh <[email protected]>
@fredoh9 fredoh9 force-pushed the fix/wait_for_fw_load branch from 00e538a to 6a6b7a2 Compare May 5, 2023 18:10
@fredoh9
Copy link
Collaborator Author

fredoh9 commented May 5, 2023

@plbossart should we add a counter and print a warning when waiting more than 10s?

I think you missed this question

@plbossart @marc-hb I can print out warning every 10 sec. instead of zero output for MAX wait time

Copy link
Contributor

@keqiaozhang keqiaozhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution.

Copy link

@paulstelian97 paulstelian97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels good but incomplete. Before merging this I would like to see a good analysis -- do some tests actually need that variable right now or can it be done later?

@marc-hb
Copy link
Collaborator

marc-hb commented May 8, 2023

@paulstelian97 this will help a lot in scenarios where the GPU driver is missing or failing and audio it timing out and where other issues make the situation much worse. I just unresolved the comment where I asked to provide some links for more context; you could start with #979.

@marc-hb
Copy link
Collaborator

marc-hb commented May 9, 2023

@plbossart @marc-hb I can print out warning every 10 sec. instead of zero output for MAX wait time

That would be great. I think we could merge this and then add these warnings later?

@fredoh9
Copy link
Collaborator Author

fredoh9 commented May 9, 2023

That would be great. I think we could merge this and then add these warnings later?

yes, let's merge this and see. If we need anything improvement, I will address them together

@fredoh9 fredoh9 merged commit fceeee0 into thesofproject:main May 10, 2023
@fredoh9
Copy link
Collaborator Author

fredoh9 commented May 10, 2023

I will monitor upcoming the test results

@marc-hb
Copy link
Collaborator

marc-hb commented May 15, 2023

I think @lyakh found an issue with this new feature: kernel logs seem to be missing when the firmware does not boot. Example: https://sof-ci.01.org/sofpr/PR7584/build7585/devicetest/index.html

In jenkins logs generic_test/13429/

10:58:40  cmd: scp -r [email protected]:~/sof-test/test-case/../logs/verify-firmware-presence/last/* results/2023-05-12-03:55:59/sh-adlp-rvp-sdw-01/verify-firmware-presence
10:58:40  return code: 1
10:58:40  cmd output: scp: /home/ubuntu/sof-test/test-case/../logs/verify-firmware-presence/last/*: No such file or directory

This is very likely because the new code runs "too early".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants