Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file-storage process hangs device #904

Open
aneisch opened this issue Dec 21, 2024 · 16 comments
Open

file-storage process hangs device #904

aneisch opened this issue Dec 21, 2024 · 16 comments

Comments

@aneisch
Copy link

aneisch commented Dec 21, 2024

Describe the problem

When attempting to copy files to device the file-storage process hogs the CPU and soon thereafter, the usb device unexpectedly disconnects from my Mac or the vehicle.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                           
 1192 root      20   0       0      0      0 R  97.0   0.0   3:52.22 file-storage 

Device

Raspberry Pi Zero 2 W

OS Image

Prebuilt TeslaUSB image

Car Model

Model S

USB connection

Glove box

Logs

Script simply hangs and does not output results. Interesting output from dmesg after a file transfer to the mounted drive is initiated from laptop.

[Fri Dec 20 21:03:14 2024] loop0: detected capacity change from 0 to 62914560
[Fri Dec 20 21:03:14 2024]  loop0: p1
[Fri Dec 20 21:03:15 2024] loop0: detected capacity change from 0 to 62912512
[Fri Dec 20 21:03:16 2024] ------------[ cut here ]------------
[Fri Dec 20 21:03:16 2024] WARNING: CPU: 1 PID: 1083 at fs/iomap/buffered-io.c:1167 iomap_file_buffered_write_punch_delalloc+0x564/0x668
[Fri Dec 20 21:03:16 2024] Modules linked in: usb_f_mass_storage overlay binfmt_misc brcmfmac_wcc brcmfmac xfs brcmutil vc4 snd_soc_hdmi_codec drm_display_helper cec drm_dma_helper cfg80211 drm_kms_helper hci_uart btbcm bluetooth snd_soc_core bcm2835_v4l2(C) bcm2835_codec(C) snd_compress bcm2835_isp(C) snd_pcm_dmaengine v4l2_mem2mem bcm2835_mmal_vchiq(C) raspberrypi_hwmon snd_bcm2835(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops ecdh_generic snd_pcm videobuf2_v4l2 ecc rfkill videodev i2c_bcm2835 snd_timer videobuf2_common snd raspberrypi_gpiomem mc vc_sm_cma(C) usb_f_ecm u_ether libcomposite dwc2 roles drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 uio_pdrv_genirq fixed uio [last unloaded: usb_f_rndis]
[Fri Dec 20 21:03:16 2024] CPU: 1 PID: 1083 Comm: file-storage Tainted: G         C         6.6.62+rpt-rpi-v7 #1  Raspbian 1:6.6.62-1+rpt1
[Fri Dec 20 21:03:16 2024] Hardware name: BCM2835
[Fri Dec 20 21:03:16 2024]  unwind_backtrace from show_stack+0x18/0x1c
[Fri Dec 20 21:03:16 2024]  show_stack from dump_stack_lvl+0x50/0x68
[Fri Dec 20 21:03:16 2024]  dump_stack_lvl from __warn+0x80/0x11c
[Fri Dec 20 21:03:16 2024]  __warn from warn_slowpath_fmt+0x190/0x198
[Fri Dec 20 21:03:16 2024]  warn_slowpath_fmt from iomap_file_buffered_write_punch_delalloc+0x564/0x668
[Fri Dec 20 21:03:16 2024]  iomap_file_buffered_write_punch_delalloc from xfs_buffered_write_iomap_end+0x40/0x98 [xfs]
[Fri Dec 20 21:03:16 2024]  xfs_buffered_write_iomap_end [xfs] from iomap_iter+0xd0/0x498
[Fri Dec 20 21:03:16 2024]  iomap_iter from iomap_file_buffered_write+0x88/0x3a0
[Fri Dec 20 21:03:16 2024]  iomap_file_buffered_write from xfs_file_buffered_write+0x94/0x2b8 [xfs]
[Fri Dec 20 21:03:16 2024]  xfs_file_buffered_write [xfs] from __kernel_write_iter+0xbc/0x274
[Fri Dec 20 21:03:16 2024]  __kernel_write_iter from kernel_write+0xd4/0x1cc
[Fri Dec 20 21:03:16 2024]  kernel_write from do_write+0x290/0x4e8 [usb_f_mass_storage]
[Fri Dec 20 21:03:16 2024]  do_write [usb_f_mass_storage] from fsg_main_thread+0xc5c/0x1da0 [usb_f_mass_storage]
[Fri Dec 20 21:03:16 2024]  fsg_main_thread [usb_f_mass_storage] from kthread+0xe8/0x104
[Fri Dec 20 21:03:16 2024]  kthread from ret_from_fork+0x14/0x38
[Fri Dec 20 21:03:16 2024] Exception stack(0x9ccf9fb0 to 0x9ccf9ff8)
[Fri Dec 20 21:03:16 2024] 9fa0:                                     00000000 00000000 00000000 00000000
[Fri Dec 20 21:03:16 2024] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[Fri Dec 20 21:03:16 2024] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[Fri Dec 20 21:03:16 2024] ---[ end trace 0000000000000000 ]---
[Fri Dec 20 21:03:50 2024] dwc2 3f980000.usb: new device is high-speed
[Fri Dec 20 21:03:50 2024] dwc2 3f980000.usb: new address 28

Additional information

I have tried recreating the microsd a few times but am still seeing the same issue.

@aneisch
Copy link
Author

aneisch commented Dec 21, 2024

I suspect this is just a kernel bug. I rebuilt the device using https://github.com/marcone/teslausb/releases/tag/v3.2 and it works as expected.

@marcone
Copy link
Owner

marcone commented Dec 21, 2024

I suspect this is just a kernel bug. I rebuilt the device using https://github.com/marcone/teslausb/releases/tag/v3.2 and it works as expected.

which one did you use before?

@aneisch
Copy link
Author

aneisch commented Dec 21, 2024

@marcone
Copy link
Owner

marcone commented Dec 25, 2024

I was able to reproduce this. First time I copied about 3GB worth of files to the CAM drive and that went OK; low CPU usage of the file-storage process, no messages in the kernel log, copy completed without error.
Then I tried another batch of files, and after a little while this started misbehaving:
started off with the same message in the kernel log you posted above, followed by this repeated over and over again:

[  698.621634] dwc2 3f980000.usb: new device is high-speed
[  698.690631] dwc2 3f980000.usb: new device is high-speed
[  698.742345] dwc2 3f980000.usb: new address 10
[  703.794460] dwc2 3f980000.usb: new device is high-speed
[  703.863481] dwc2 3f980000.usb: new device is high-speed
[  703.914857] dwc2 3f980000.usb: new address 10
[  708.959307] dwc2 3f980000.usb: new device is high-speed
[  709.031303] dwc2 3f980000.usb: new device is high-speed
[  709.082597] dwc2 3f980000.usb: new address 10
[  714.137180] dwc2 3f980000.usb: new device is high-speed
[  714.208152] dwc2 3f980000.usb: new device is high-speed
[  714.259205] dwc2 3f980000.usb: new address 10

The file-storage process was also using most of the CPU cycles at this point.

I canceled the copy on the Windows machine, but that didn't seem to work as the copy progress window just stayed opened.
I then logged in to the Pi and ran /root/bin/disable_gadget.sh
This disconnected the drive from the PC (which canceled the copy and dismissed the copy progress window), but it ended up hanging the script on the Pi. I suspect it was hanging on the echo > "$gadget_root/UDC since any access to UDC blocked at that point.

I haven't had a chance yet to try if it also fails like this with an official Raspberry Pi OS build.

kern.log

@marcone
Copy link
Owner

marcone commented Dec 26, 2024

Same behavior with the official Raspberry Pi OS Lite 32-bit image.

@marcone
Copy link
Owner

marcone commented Dec 26, 2024

So far I've been unable to reproduce the behavior with the official Raspberry Pi OS Lite 64-bit image

@marcone
Copy link
Owner

marcone commented Dec 26, 2024

It looks to me like the problem occurs when the disk image gets near 4 GB in actual allocated size.
I can copy a little over 3.9 GB to the drive, but after that any attempts to copy additional files to the drive will fail in the manner described above, even after a reboot of the Pi.

I'm guessing the fact that it happens close to 4 GB is related to the fact that it only happens with the 32-bit image, though it's interesting that the problem doesn't appear to happen exactly at the 4 GB boundary, but slightly before.

@marcone
Copy link
Owner

marcone commented Dec 26, 2024

Unfortunately, kernel version 6.12, which is supposed to be the next stable kernel for Raspberry Pi OS, has the same problem.
I have a sneaking suspicion that this code is wrong, since there was a previous change that changed that same code construct because of a possible 32-bit overflow.

@marcone
Copy link
Owner

marcone commented Dec 28, 2024

It appears that the code I suspected is wrong is not actually the cause of the problem, since fixing it doesn't make the problem go away.
I have a fairly easy repro case using a standard Raspberry Pi OS image and not involving TeslaUSB or mass storage emulation, but I think the problem is an upstream linux kernel problem, not Raspberry Pi OS specific. Not sure where/how to report it though.

@aneisch
Copy link
Author

aneisch commented Dec 28, 2024

I appreciate and admire your deligance in tracking this down. I'm not sure where the best place to report would be either.

@marcone
Copy link
Owner

marcone commented Dec 30, 2024

It looks like there's already a discussion about the same/similar problem on the relevant kernel mailing list, so hopefully there will be a fix, and hopefully that fix will be backported to older affected kernels as well.

@marcone
Copy link
Owner

marcone commented Jan 2, 2025

I've sent a patch for the issue to the relevant Linux kernel mailing lists. We'll see if they take it, or if I get yelled at :)

@marcone
Copy link
Owner

marcone commented Jan 7, 2025

The patch is now in the upstream "linux-next" branch, from where it should eventually make its way into an upstream release.
It has also been merged into the 6.6, 6.12 and 6.13 branches for the official Raspberry Pi kernel, but it might take a while to see an actual new release from there.

Note again that this is only a problem for 32-bit kernels, so for devices with a 64-bit capable CPU you can use a 64-bit OS build. For Raspberry Pi Zero W (which is 32-bit only), please continue to use the TeslaUSB prebuilt image for now.

@marcone
Copy link
Owner

marcone commented Jan 9, 2025

Turns out that questionable code I mentioned in an earlier comment is in fact also broken and can lead to a different 32-bit overflow and infinite loop. I've submitted a patch for that to upstream as well.

@nowherenearaninja
Copy link

nowherenearaninja commented Jan 11, 2025

Yep, had this issue too.
First time user of teslausb, this is one of the coolest opensource projects I have come across. Thanks for your commitment.
The install process is awesome.

I had what i thought was usecase 1A, i.e. raspberry zero 2w, sd card only. Worked on first try, copied files to my NAS, but only bits. Thought it might be the speed of USB. But then eventually (a lot of headache) found this thread/issue. Reinstalled using raspian 64bit and then the full script instead of of prebuilt teslausb image. Et voila.

Suggestion if I may.
Add a temporary heads-up in the OneStepSetup doc that warns about using the prebuilt image, and instead point toward the full install process using 64 bit OS.

@marcone
Copy link
Owner

marcone commented Jan 11, 2025

Suggestion if I may.
Add a temporary heads-up in the OneStepSetup doc that warns about using the prebuilt image

I removed the prebuilt images that had the bug 2 weeks ago, so there is only one prebuilt image right now, which works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants