Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch firmware errors #1075

Merged
merged 2 commits into from
Apr 23, 2024
Merged

Catch firmware errors #1075

merged 2 commits into from
Apr 23, 2024

Conversation

marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Jun 30, 2023

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 1, 2023

There are a LOT of ERRORs in stable-v2.2 https://sof-ci.01.org/softestpr/PR1075/build537/devicetest/index.html , about half the tests have some.

They all seem to be this timeout:

    60561676.864329] (        1924.374878) c0 wait    src/lib/wait.c:53   ERROR poll timeout reg 488520 mask 63 val 0 us 937

Should it really be at the ERROR level?

In one sample context:

[    58851889.953103] (         114.739578) c0 ll-schedule        ./schedule/ll_schedule.c:186  INFO task 0xbe0a2280 pipe-task  avg 1763, max 1796
[    59389729.931731] (      537840.000000) c0 ll-schedule        ./schedule/ll_schedule.c:186  INFO task 0x9e08c858 agent-work  avg 114, max 119
[    60558956.656103] (     1169226.750000) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
[    60558979.833186] (          23.177082) c0 pipe         9.50  ....../pipeline-stream.c:270  INFO pipe trigger cmd 0
[    60559752.489405] (         772.656189) c0 ssp-dai      1.2   /drivers/intel/ssp/ssp.c:1122 INFO ssp_trigger() cmd 0
[    60561676.864329] (        1924.374878) c0 wait                         src/lib/wait.c:53   ERROR poll timeout reg 488520 mask 63 val 0 us 937
[    60561693.478911] (          16.614582) c0 ssp-dai      1.2   /drivers/intel/ssp/ssp.c:61   WARN ssp_empty_tx_fifo() warning: timeout
[    60561711.655994] (          18.177082) c0 ssp-dai      1.2   /drivers/intel/ssp/ssp.c:1084 INFO ssp_stop(), TX stop
[    60561733.374743] (          21.718750) c0 dw-dma                 src/drivers/dw/dma.c:413  INFO dw_dma_stop(): dma 0 channel 0 stop
[    60561767.437242] (          34.062500) c0 ll-schedule        ./schedule/ll_schedule.c:159  INFO task complete 0xbe0a2200 pipe-task 

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 1, 2023

cavs25 https://sof-ci.01.org/softestpr/PR1075/build536/devicetest/index.html and MTL https://sof-ci.ostc.intel.com/#/result/planresultdetail/28422 show both the same,
interesting ERROR pattern: the nocodec configurations have firmware <err> in practically every test. The configuration with codec have no ERROR message at all (except in very few, other, unrelated and known failures)

Most error messages seem be this timeout

[    0.037243] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: dai_ssp_pm_runtime_en_ssp_power en_ssp_power index 0
[    0.037381] <err> dai_intel_ssp: dai_ssp_poll_for_register_delay: dai_ssp_poll_for_register_delay poll timeout reg 465924 mask 256 val 0 us 125
[    0.037413] <wrn> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: dai_ssp_pm_runtime_en_ssp_power warning: timeout
[    0.037421] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: dai_ssp_pm_runtime_en_ssp_power I2SLCTL

@marc-hb marc-hb force-pushed the catch-fw-errors branch from 24d96e2 to 2f8bc05 Compare July 1, 2023 00:56
@marc-hb marc-hb changed the title Catch firmware errors [SKIP SOF-TEST] Catch firmware errors Jul 1, 2023
@marc-hb marc-hb force-pushed the catch-fw-errors branch from 2f8bc05 to 4a1e6da Compare July 1, 2023 00:59
@marc-hb marc-hb changed the title [SKIP SOF-TEST] Catch firmware errors Catch firmware errors Jul 1, 2023
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 6, 2023

New run, same results.

@marc-hb

This comment was marked as outdated.

@marc-hb

This comment was marked as outdated.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 13, 2023

SOFCI TEST

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 9, 2024

cavs https://sof-ci.01.org/softestpr/PR1075/build129/devicetest/index.html has some new type of error:

[    0.598035] <inf> dai_intel_dmic_nhlt: dai_dmic_configure_coeff: fir_length_a = 101, fir_length_b = 247, packed = 0
[    0.598055] <inf> dai_intel_dmic_nhlt: dai_nhlt_dmic_dai_params_get: set 4ch pdm0 and pdm1
[    0.598065] <err> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: pdm = 0, FIR_CONFIG = 0x00010064
[    0.598078] <err> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: dai_index = 0, rate_div = 800, p_clkdiv = 16, p_mcic = 25, p_mfir = 2
[    0.598090] <inf> dai_intel_dmic_nhlt: dai_nhlt_update_rate: rate = 48000, channels = 4, format = 2
[    0.598100] <inf> dai_intel_dmic_nhlt: dai_nhlt_update_rate: io_clk 38400000, rate_div 800
[    0.598111] <inf> dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: dmic_set_config_nhlt(): enable0 3, enable1 3

cavs25 https://sof-ci.01.org/softestpr/PR1075/build753/devicetest/index.html is from the old build service and very old (sept 2023)

MTL https://sof-ci.01.org/softestpr/PR1075/build128/devicetest/index.html has plenty of FW errors but unfortunately no logs, dunno why.

stable-v2.2 has underruns https://sof-ci.01.org/softestpr/PR1075/build130/devicetest/index.html and suspend/resume failures, otherwise mostly OK.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 9, 2024

SOFCI TEST

EDIT, now blocked by (among others):

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 10, 2024

cavs https://sof-ci.01.org/softestpr/PR1075/build269/devicetest/index.html has the "ignoring blob" errors in NOCODEC and not much else I could find.

Same thing with ACE https://sof-ci.01.org/softestpr/PR1075/build270/devicetest/index.html

stable-v2.2 https://sof-ci.01.org/softestpr/PR1075/build271/devicetest/index.html seems to have many underruns and that's it

@ujfalusi
Copy link
Contributor

ujfalusi commented Apr 11, 2024

@marc-hb, the underruns looks like to be coming from pause/resume tests only, right?

All logs contains that I can see via CI (2023-09-04 is the oldest)

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 11, 2024

I think you might be right. The rest in https://sof-ci.01.org/softestpr/PR1075/build269/devicetest/index.html seems to be ERROR dtrace_add_event() which could probably be lowered to a WARN and that's it?

In any case stable-v2.2 is obviously not the priority. I'd really like to enable this for Zephyr.

marc-hb added a commit to marc-hb/sof that referenced this pull request Apr 11, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining one, see
thesofproject/sof-test#1075 (comment)

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 23, 2024

SOFCI TEST

EDIT: MTL https://sof-ci.01.org/softestpr/PR1075/build327/devicetest/index.html is 100% green!

CAVS https://sof-ci.01.org/softestpr/PR1075/build328/devicetest/index.html has only one device available but it's all green.

LNL https://sof-ci.01.org/softestpr/PR1075/build326/devicetest/index.html has one unrelated device PM runtime misconfiguration.

stable-v2.2 still has a lot of dtrace dropped logs errors https://sof-ci.01.org/softestpr/PR1075/build325/devicetest/index.html
thesofproject/sof#9036

Internal Intel issue 448.

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb marc-hb marked this pull request as ready for review April 23, 2024 23:20
@marc-hb marc-hb requested a review from a team as a code owner April 23, 2024 23:20
Copy link
Collaborator

@fredoh9 fredoh9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's merge this now

@fredoh9 fredoh9 merged commit d10020d into thesofproject:main Apr 23, 2024
3 of 7 checks passed
@marc-hb marc-hb deleted the catch-fw-errors branch April 23, 2024 23:35
@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 24, 2024

MTL https://sof-ci.01.org/softestpr/PR1075/build331/devicetest/index.html 100% green.

stable-v2.2 100% (cause not catching errors yet) https://sof-ci.01.org/softestpr/PR1075/build332/devicetest/index.html

LNL has a couple of known issues already listed above
https://sof-ci.01.org/softestpr/PR1075/build329/devicetest/index.html

CAVS https://sof-ci.01.org/softestpr/PR1075/build330/devicetest/index.html had only one MODEL available but it's all green.

marc-hb added a commit to marc-hb/sof that referenced this pull request Apr 24, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining one, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace" and stop using the DMA trace when it's
already saturated.

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this pull request Apr 25, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining errors in stable-v2.2, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace": drop the dubious recursion and stop using
the DMA trace when it's already saturated.

Disclaimer: this was (successfully) tested only on stable-v2.2, see

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this pull request Apr 25, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining errors in stable-v2.2, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace": drop the dubious recursion and stop using
the DMA trace when it's already saturated.

Disclaimer: this was (successfully) tested only on stable-v2.2, see thesofproject#9036

Signed-off-by: Marc Herbert <[email protected]>
lgirdwood pushed a commit to thesofproject/sof that referenced this pull request Apr 30, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining errors in stable-v2.2, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace": drop the dubious recursion and stop using
the DMA trace when it's already saturated.

Disclaimer: this was (successfully) tested only on stable-v2.2, see #9036

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this pull request Apr 30, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining errors in stable-v2.2, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace": drop the dubious recursion and stop using
the DMA trace when it's already saturated.

Disclaimer: this was (successfully) tested only on stable-v2.2, see thesofproject#9036

Signed-off-by: Marc Herbert <[email protected]>
(cherry picked from commit bb31696)
kv2019i pushed a commit to thesofproject/sof that referenced this pull request May 2, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining errors in stable-v2.2, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace": drop the dubious recursion and stop using
the DMA trace when it's already saturated.

Disclaimer: this was (successfully) tested only on stable-v2.2, see #9036

Signed-off-by: Marc Herbert <[email protected]>
(cherry picked from commit bb31696)
eddy1021 pushed a commit to eddy1021/sof that referenced this pull request Jul 15, 2024
Dropping logs is bad but it should not be fatal in itself.

This ERROR is one of the last remaining errors in stable-v2.2, see
thesofproject/sof-test#1075 (comment)

Also switch to the "etrace": drop the dubious recursion and stop using
the DMA trace when it's already saturated.

Disclaimer: this was (successfully) tested only on stable-v2.2, see thesofproject#9036

Signed-off-by: Marc Herbert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Blocker bugs or important features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants