Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add critsec to rp2040 xfer, check endpoint status #2474

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gemarcano
Copy link

Describe the PR
hw_endpoint_lock_update is unimplemented for the rp2040 port. This PR takes a stab at implementing it. Without it, if the USB port is disconnected, say with tud_disconnect, a race condition can happen where a transfer is in progress and the USB IRQ can fire, calling reset_non_control_endpoints and leaving the ongoing transfer in a bad state, as it tries to access endpoint data structures that are now zero'd.

While debugging that main issue, I also found some cases of the rp2040 port assuming that an endpoint data structure exists for all endpoints, which is not the case as endpoint 0 has none. I've also included a few simple fixes for these.

Additional context
I've documented a lot of my debugging attempts in this discussion: #1764

I'm not super familiar with the tinyusb coding style, and I'm not sure my lock implementation is right (very naive, likely bad reference counting, might need atomic_int instead of int?).

@Slion
Copy link
Contributor

Slion commented Jul 12, 2024

This did not work so well for me as already mentioned there: #1764 (comment)
Still, very interesting stuff, thanks for sharing.

@gemarcano gemarcano force-pushed the rp2040_irq_lock_fix branch from 5ee7ae9 to dcf472c Compare July 16, 2024 17:49
@Slion
Copy link
Contributor

Slion commented Oct 7, 2024

A lot has change in my project since I first came across this issue and now somehow even with the workarounds I had built-in I found a use case where I would always get PANIC ep 0 out was already available

So I eventually decided to checkout your branch gemarcano:rp2040_irq_lock_fix and sure enough it worked 🥳

Then I decided to remove the workaround I had in place to take care of the worst cases. It's basically filtering out some HID reports causing multiple USB remounts. Without it still did not panic but the USB connection seems to be broken somehow, not responding. I'll have to leave that workaround for now but it looks like I'll keep using your patch for the time being.

@Slion
Copy link
Contributor

Slion commented Oct 7, 2024

As I was trying to rebase that on the tip of master I realised the tip of master works just fine too. Same symptom as described above. So I'm afraid I can't tell if that patch helps in any way after all. What's sure is that it works just as well as the tip of master apparently. I could rebase it with no conflict too.

@Slion
Copy link
Contributor

Slion commented Oct 19, 2024

Tip of master still gave me that Panic after all, so I've started trying that patch again. Sure enough, I have not seen that Panic since. A couple of crashes and reboots maybe but possibly unrelated.

@gemarcano gemarcano force-pushed the rp2040_irq_lock_fix branch 2 times, most recently from 5441fb3 to 2318575 Compare November 13, 2024 06:16
 - Implemented a critical section for the rp2040 port, which now
   prevents an IRQ from clearing the endpoint data structures while a
   transfer was in progress, which would then lead to a null pointer
   derefence.
 - Fixed a null-pointer dereference regarding ep->endpoint_control for
   endpoint 0, which does not have a control register.
@gemarcano gemarcano force-pushed the rp2040_irq_lock_fix branch from 1eb62ba to f5b2459 Compare December 6, 2024 20:25
@gemarcano
Copy link
Author

There were some recent changes in the rp2040 code adding ISO endpoints, so I had to tweak this PR a little bit to carry over the null-pointer dereference checks. The main change is that ep->configured = true; in dcd_rp2040.c is now set in the hw_endpoint_init function so it is set when either type of endpoint is initialized. This is similar to what happens in hcd_rp2040.c.

Thinking about it more, I should probably split the locking implementation from the null-pointer deref checks/fixes. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants