Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LNL NOCODEC alsa-bat playback failure #9123

Closed
plbossart opened this issue Apr 30, 2024 · 50 comments · Fixed by #9276
Closed

LNL NOCODEC alsa-bat playback failure #9123

plbossart opened this issue Apr 30, 2024 · 50 comments · Fixed by #9276
Labels
bug Something isn't working as expected LNL Applies to Lunar Lake platform P2 Critical bugs or normal features
Milestone

Comments

@plbossart
Copy link
Member

plbossart commented Apr 30, 2024

Failure reported in
https://sof-ci.01.org/linuxpr/PR4899/build2589/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821

This could be the consequence of a xrun.

cc:

@plbossart plbossart added the LNL Applies to Lunar Lake platform label Apr 30, 2024
@plbossart
Copy link
Member Author

@plbossart
Copy link
Member Author

plbossart commented May 13, 2024

https://sof-ci.01.org/linuxpr/PR4899/build2589/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821 has an xrun reported by userspace but nothing in the firmware trace.

https://sof-ci.01.org/sofpr/PR9119/build4654/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-599 has no xrun reported by userspace, but the trace has tons of

[ 4138.879225] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192
[ 4138.883155] <wrn> dai_comp: dai_common_copy: comp:1 0x10004 dai_common_copy(): Copy_bytes 192 + avail bytes 56 < period bytes 384, possible glitch
[ 4138.891155] <wrn> dai_comp: dai_common_copy: comp:1 0x10004 dai_common_copy(): Copy_bytes 192 + avail bytes 56 < period bytes 384, possible glitch
[ 4138.892155] <wrn> dai_comp: dai_common_copy: comp:1 0x10004 dai_common_copy(): Copy_bytes 192 + avail bytes 8 < period bytes 384, possible glitch

@plbossart
Copy link
Member Author

@ujfalusi @kv2019i what do you think of this one? The topology is supposedly the same as on MTL, so the DMA/firmware changes are probably the reason for these test failures.

@kv2019i
Copy link
Collaborator

kv2019i commented May 13, 2024

@plbossart Both above test runs have "Broken pipe":

2024-04-30 18:02:26 UTC [REMOTE_COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 821 -k 2.1
2024-04-30 18:02:27 UTC [REMOTE_COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -f S16_LE -F 821 -k 2.1
Overrun: Broken pipe(-32)

I think this is related. The capture audio file has flawless audio for first 33000 samples and then it breaks down.

@marc-hb
Copy link
Collaborator

marc-hb commented May 14, 2024

Another, very recent alsabat regression, this time on capture and MTLP_RVP_HDA ba-mtlp-rvp-hda-03. Yet another, different issue?

If different then it would be a very strange timing coincidence because both test cases have been very consistently passing over the last few weeks.

https://sof-ci.01.org/softestpr/PR1193/build422/devicetest/index.html?model=MTLP_RVP_HDA&testcase=check-alsabat-headset-capture-997

@plbossart plbossart transferred this issue from thesofproject/linux May 14, 2024
@marc-hb marc-hb changed the title LNL NOCODEC alsa-bat failure LNL NOCODEC alsa-bat playback failure May 17, 2024
@plbossart
Copy link
Member Author

Now we have a broken test with NOCODEC alsa-bat capture

https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-capture-821

I must admit I don't know if what the difference is between alsa-bat-headset-playback and alsa-bat-headset-capture, we need to play and capture on the same devices and test the same loopback

playback:
https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821
2024-05-20 19:59:03 UTC [REMOTE_COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 821 -k 2.1
2024-05-20 19:59:04 UTC [REMOTE_COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -f S16_LE -F 821 -k 2.1

capture:
https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-capture-821
2024-05-20 20:00:49 UTC [REMOTE_COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 821 -k 2.1
2024-05-20 20:00:51 UTC [REMOTE_COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -f S16_LE -F 821 -k 2.1

Looks the same to me? What am I missing @fredoh9 @marc-hb @kv2019i ?

@abonislawski abonislawski added the P1 Blocker bugs or important features label May 21, 2024
@kv2019i
Copy link
Collaborator

kv2019i commented May 21, 2024

@plbossart At least in the capture case, there is again glitches in capture audio. In specific case of
https://sof-ci.01.org/linuxpr/PR5007/build3042/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-playback-821
... it's smaller than before and seems to be exactly one 1ms tick of audio missing:

image

@marc-hb
Copy link
Collaborator

marc-hb commented May 22, 2024

Looks the same to me? What am I missing @fredoh9 @marc-hb @kv2019i ?

@fredoh9 just explained this to me: NOCODEC is special.

Outside NOCODEC:

  • alsabat-playback: SOF playback, USB capture
  • alsabat-capture: USB playback, SOF capture

NOCODEC is special, it has no USB. So:

  • SOF playback, SOF capture.

So both tests do more or less the same thing on NOCODEC.

@plbossart
Copy link
Member Author

Right, so can we skip the capture part to avoid unnecessary confusions?

@lgirdwood
Copy link
Member

... it's smaller than before and seems to be exactly one 1ms tick of audio missing:

and its also suspiciously close to 0.5 seconds of audio. 0.5 second USB audio buffer size ???

@plbossart
Copy link
Member Author

... it's smaller than before and seems to be exactly one 1ms tick of audio missing:

and its also suspiciously close to 0.5 seconds of audio. 0.5 second USB audio buffer size ???

There's no USB involved in Nocodec mode, it's an SSP loopback.

@marc-hb
Copy link
Collaborator

marc-hb commented May 22, 2024

Right, so can we skip the capture part to avoid unnecessary confusions?

"send patches"

@marc-hb

This comment was marked as off-topic.

@plbossart
Copy link
Member Author

Just spotted a capture failure on HDA

different interface, different issue please

@wszypelt wszypelt added the bug Something isn't working as expected label May 24, 2024
@wszypelt
Copy link

@plbossart can I ask for the logs with ipc payload?

@plbossart
Copy link
Member Author

In theory the dmesg log should already have the entire payload provided, an option must be missing in the CI setup.
@fredoh9 @marc-hb @ssavati do we turn the BIT(11) on for sof_debug?

#define SOF_DBG_DUMP_IPC_MESSAGE_PAYLOAD	BIT(11) /* On top of the IPC message header
							 * dump the message payload also

@fredoh9
Copy link
Contributor

fredoh9 commented May 28, 2024

All LNL RVPs in SOF CI have the IPC dump enabled.

@marc-hb
Copy link
Collaborator

marc-hb commented May 28, 2024

do we turn the BIT(11) on for sof_debug?

It's hit-and-miss because:

  • We haven't productized Ansible yet, so it breaks often when someone reserves a device or when setting up a new one.
  • We can't set bits one by one and we don't have "official" sof_debug values documented anywhere yet.

@marc-hb
Copy link
Collaborator

marc-hb commented May 29, 2024

According to @kv2019i :

sof_debug=1 for others, and 3073/0x0c01 for nocodec

Correct?

@fredoh9
Copy link
Contributor

fredoh9 commented May 29, 2024

need to keep bit(11) enabled.

  • nocodec mode 0xC01
  • Others 0x801

@ssavati
Copy link

ssavati commented May 29, 2024

All LNL RVPs in SOF CI have the IPC dump enabled.

i have checked all BA devices also for IPC dump. its enabled with above values

@ujfalusi
Copy link
Contributor

@fredoh9, if we want to have payload dump enabled on all configurations then 0xc01 for nocodec and 0x801 for others.
It really depends what we want.
As a sidenote: the payload printing deemed to be 'heavy' and kept disabled. It also alters timings (adds delay) which can hide races when we have less kernel print activity.

It would be better if we could get bpftrace working to get the IPC message dumps, but they will be without the kernel print provided contexts...
I have this semi working implementation:

#!/usr/bin/bpftrace
   
struct sof_ipc4_msg {
	union {
		unsigned long header_u64;
		struct {
			unsigned int primary;
			unsigned int extension;
		};
	};

	unsigned long data_size;
	void *data_ptr;
};
 
BEGIN
{
	printf("SOF IPC4 message logging. Ctrl-C to end.\n");
}

kprobe:sof_ipc4_log_header {
	$msg = (struct sof_ipc4_msg *)arg2;

	if (arg3 == 1 && $msg->data_size != 0) {
		printf("%s : 0x%x|0x%x [data size:: %llu]\n", str(arg1),
		       $msg->primary, $msg->extension, $msg->data_size);
		if (!strcontains(str(arg1), "done")) {
			printf("Message payload:\n%rh\n", buf($msg->data_ptr,
			       $msg->data_size));
		}
	} else {
		printf("%s : 0x%x|0x%x\n", str(arg1), $msg->primary,
		       $msg->extension);
	}	
}

it prints all IPC messages but the payload printing will cut the print at BPFTRACE_MAX_STRLEN and so far I was not able to work that around to split up the payload print. There are limitations what we can do with bpftrace, if anyone can make this work then we can probably introduce an IPC log alongside of dmesg and the firmware log?

softwarecki added a commit to softwarecki/sof that referenced this issue Jul 2, 2024
Setting the source/sink buffers parameters in the copier_update_params
function is not sufficient. If a sink buffer is attached during copier
operation, the module will not set buffers parameters. Move the buffer
parameter configurations to the new copier_bind bind function. Delete
the sink buffers configurations from the copier_module_copy function.
There is no need to configure sink buffers parameters on each copy.
We are assured that they were configured at the time of bind.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
softwarecki added a commit to softwarecki/sof that referenced this issue Jul 2, 2024
Setting the source/sink buffers parameters in the copier_update_params
function is not sufficient. If a sink buffer is attached during copier
operation, the module will not set buffers parameters. Move the buffer
parameter configurations to the new copier_bind bind function. Delete
the sink buffers configurations from the copier_module_copy function.
There is no need to configure sink buffers parameters on each copy.
We are assured that they were configured at the time of bind.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
softwarecki added a commit to softwarecki/sof that referenced this issue Jul 2, 2024
Setting the source/sink buffers parameters in the copier_update_params
function is not sufficient. If a sink buffer is attached during copier
operation, the module will not set buffers parameters. Move the buffer
parameter configurations to the new copier_bind bind function. Delete
the sink buffers configurations from the copier_module_copy function.
There is no need to configure sink buffers parameters on each copy.
We are assured that they were configured at the time of bind.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 2, 2024

Reproduction rate low in past week, continue with P2 and assign to v2.11.

This just happened again in CI:

https://sof-ci.01.org/softestpr/PR1213/build576/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-headset-capture-821

Have you made sure the kernel, firmware and system configuration is 100% identical? Even better: have you just downloaded and installed packages from CI directly? On the same system(s).

softwarecki added a commit to softwarecki/sof that referenced this issue Jul 3, 2024
Setting the source/sink buffers parameters in the copier_update_params
function is not sufficient. If a sink buffer is attached during copier
operation, the module will not set buffers parameters. Move the buffer
parameter configurations to the new copier_bind bind function. Delete
the sink buffers configurations from the copier_module_copy function.
There is no need to configure sink buffers parameters on each copy.
We are assured that they were configured at the time of bind.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
softwarecki added a commit to softwarecki/sof that referenced this issue Jul 4, 2024
Setting the sink buffers parameters in the copier_update_params function is
not sufficient. If a sink buffer is attached during copier operation, the
module will not set buffers parameters. Move the buffer parameter
configuration to the new copier_bind bind function. Delete the sink buffers
configuration from the copier_module_copy function. There is no need to
configure sink buffers parameters on each copy. We are assured that they
were configured at the time of bind.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
softwarecki added a commit to softwarecki/sof that referenced this issue Jul 5, 2024
Setting the sink buffers parameters in the copier_update_params function is
not sufficient. If a sink buffer is attached during copier operation, the
module will not set buffers parameters. Move the buffer parameter
configuration to the new copier_bind bind function. Delete the sink buffers
configuration from the copier_module_copy function. There is no need to
configure sink buffers parameters on each copy. We are assured that they
were configured at the time of bind.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
softwarecki added a commit to softwarecki/sof that referenced this issue Jul 8, 2024
Setting the sink buffers parameters in the copier_update_params function is
not sufficient. If a sink buffer is attached during copier operation, the
module will not set buffers parameters. Add bind function to configure sink
buffers parameters.

There is no need to configure sink buffers parameters on each copy. We are
assured that they were configured at the time of bind. Remove
ipc4_update_buffer_format from the copier_module_copy function.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
@softwarecki
Copy link
Collaborator

Fixed in PR #9276

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 15, 2024

To be closed after one round of daily tests.

@marc-hb marc-hb reopened this Jul 15, 2024
eddy1021 pushed a commit to eddy1021/sof that referenced this issue Jul 15, 2024
Setting the sink buffers parameters in the copier_update_params function is
not sufficient. If a sink buffer is attached during copier operation, the
module will not set buffers parameters. Add bind function to configure sink
buffers parameters.

There is no need to configure sink buffers parameters on each copy. We are
assured that they were configured at the time of bind. Remove
ipc4_update_buffer_format from the copier_module_copy function.

Fixes: thesofproject#9123

Signed-off-by: Adrian Warecki <[email protected]>
@marc-hb marc-hb closed this as completed Jul 18, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 18, 2024

@softwarecki
Copy link
Collaborator

softwarecki commented Jul 19, 2024

@marc-hb I don't see any glitches in the attached wave file
bat

only the peak at 999 Hz is visible in fft
image

I think this is a different issue than the one I fixed.

@wszypelt
Copy link

@marc-hb Could you please take a look at the @softwarecki respone?

@lgirdwood
Copy link
Member

@ssavati @fredoh9 do we have the non USB loopback connectors enabled yet ?

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 23, 2024

I don't see any glitches in the attached wave file

I don't know anything about the details sorry but I remember that in the past every time a "human eye" and alsabat disagreed, alsabat won eventually. I have no idea what caused this particular failure: it could be a LNL-specific USB bug, or a LNL-specific audio bug, a bug in sof-test or even something else but all the evidence so far has shown alsabat itself to be the least suspect.

@fredoh9
Copy link
Contributor

fredoh9 commented Aug 24, 2024

@ssavati @fredoh9 do we have the non USB loopback connectors enabled yet ?

I haven't purchased the loopback cables yet.

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 9, 2024

Reviewing a week worth of daily test plans with this configurationg (starting 2024-09-02), this test is passing all the time. It is likely zephyrproject-rtos/zephyr#77805 fix was responsible for the occasional 1ms glitches seen in this test and those are now gone. There is one failure left in the alsa-bat-capture test, I'll file a separate bug to track that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected LNL Applies to Lunar Lake platform P2 Critical bugs or normal features
Projects
None yet
Development

Successfully merging a pull request may close this issue.