Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32-S3 USB serial output stops working #9454

Closed
samblenny opened this issue Jul 24, 2024 · 28 comments · Fixed by #9590
Closed

ESP32-S3 USB serial output stops working #9454

samblenny opened this issue Jul 24, 2024 · 28 comments · Fixed by #9590
Assignees
Labels
bug espressif applies to multiple Espressif chips usb
Milestone

Comments

@samblenny
Copy link

samblenny commented Jul 24, 2024

CircuitPython version

Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit QT Py ESP32-S3 no psram with ESP32S3

Code/REPL

from time import monotonic, sleep
from board import NEOPIXEL, NEOPIXEL_POWER
from neopixel_write import neopixel_write
from digitalio import DigitalInOut, Direction


np = DigitalInOut(NEOPIXEL)
pwr = DigitalInOut(NEOPIXEL_POWER)
np.direction = Direction.OUTPUT
pwr.direction = Direction.OUTPUT
pwr.value = True
epoch = monotonic()
colors = [bytearray([0, 0, 0]), bytearray([5, 0, 5])]
i = 0
while True:
    neopixel_write(np, colors[i])
    print(monotonic() - epoch)
    i ^= 1
    sleep(1)

Behavior

Expected: At 1 second intervals, Neopixel should toggle and a timestamp should print. This should continue indefinitely.

Actual: With the host computer's screen unlocked (macOS Sonoma 14.5 + M1 Mac mini), the code runs as expected. But, if I lock the macOS screen (apple menu > Lock Screen), wait about 5 minutes, then log back in, the Neopixel continues blinking but nothing prints to the serial console. (no timestamps, no supervisor messages if I do Ctrl-C, etc).

Description

The USB serial output of my QT Py ESP32-S3 often becomes unresponsive when I leave it plugged into an M1 Mac mini with macOS Sonoma 14.5. This happens both with CircuitPython 9.1.0 and 9.1.1.

Everything is normally fine with the screen unlocked. But, if I lock the screen, then come back a bit later (5 minutes is usually enough), I will often have to reset the board before I can see serial output again (reset button or unplug/replug both work).

The weird thing is that when the USB serial output glitch is happening, my CIRCUITPY drive is still available, my code still runs (LED blinks), and USB serial input (from mac to the ESP32-S3) appears to still work. I say that because, if I do a screen -fn /dev/tty.usbmodem..., then type Ctrl-C, it causes the S3's LED to do an intermittent double-red blink in the same manner
as Ctrl-C under normal circumstances. But, when I do the Ctrl-C, none of the usual output text appears in screen's terminal window. If I reset or eject the CIRCUITPY drive, unplug the board, then plug it back in, the serial output resumes working normally.

In macOS System Settings > Energy Saver, my settings are:

  • Prevent automatic sleeping when the display is off: OFF
  • Put hard disks to sleep when possible: ON
  • Wake for network access: OFF
  • Start up automatically after a power failure: OFF

In System Settings > Lock Screen, my settings are:

  • Start Screen Saver when Inactive: Never
  • Turn display off when inactive: For 20 Minutes
  • Require password after screen saver begins or display is turned off:
    After 5 seconds

Additional information

No response

@samblenny samblenny added the bug label Jul 24, 2024
@dhalbert
Copy link
Collaborator

Do you have boards with othert chip families (e.g. RP2040), or do you have an ESP32-S2? Do they have the same behavior?

@samblenny
Copy link
Author

I have not noticed this happening on my RP2040 board or my SAMD21 board

@dhalbert dhalbert added this to the 9.1.x milestone Jul 24, 2024
@samblenny
Copy link
Author

the ESP32 boards I have are all ESP32-S3

@samblenny
Copy link
Author

samblenny commented Jul 26, 2024

Another data point... I tried a similar reproducer on an Adafruit Metro ESP32-S3 with an attached Adafruit 2.8" TFT Touch Shield. But, this time I set the display up to echo the serial console. In this case, the TFT console continues working (taking keyboard input from the USB serial connection!), even after the USB serial output to screen -fn /dev/tty.usbmodem... stops working.

CircuitPython version

Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit Metro ESP32S3 with ESP32S3

Code/REPL

from board import D9, D10, SPI
from digitalio import DigitalInOut, Direction
from displayio import release_displays
from fourwire import FourWire
from time import monotonic, sleep
from adafruit_ili9341 import ILI9341

release_displays()
bus = FourWire(SPI(), command=D9, chip_select=D10)
display = ILI9341(bus, width=320, height=240)
display.rotation = 180  # landscape with Metro S3 USB port on the right
epoch = monotonic()
while True:
    print(monotonic() - epoch)
    sleep(1)

Behavior

Expected: At 1 second intervals, timestamp prints to 2.8" TFT shield and USB serial console.

Actual: With the host computer's screen unlocked, the code runs as expected. But, if I lock the macOS screen (apple menu > Lock Screen), wait a few minutes, then log back in, the USB serial output to screen stops working. But, input from typing in the screen window still gets sent to the Metro S3, and the REPL on the TFT display shield still works.

This is a webcam shot of the REPL on the TFT shield next to my Terminal window running screen. The output to screen stopped at about 99 seconds after I locked my mac, but the timestamps on the TFT shield kept going until I unlocked the mac and did a Ctrl-C using screen (at about 160 seconds after initially locking the mac).

tft-REPL-with-screen

If things were working normally, I would expect the contents of my screen Terminal window to match the text shown on the TFT console, including the KeyboardInterrupt message and following REPL stuff.

@RetiredWizard
Copy link

Is there any chance flow control is getting confused? Does a Ctrl-q on the frozen screen do anything. It looks like there is a MacOS command defflow off to set the default flow control for "screens" to off.

@samblenny
Copy link
Author

Yeah, I agree it does seem a lot like some kind of weird flow control issue. I'm under the impression that when I invoke screen with screen -fn ..., the -fn should turn screen's flow control off. The man screen section about the -f, -fn, and -fa switches mentions 'This can also be defined through the "defflow" .screenrc command'. So, maybe -fn and defflow off do the same thing?

@samblenny
Copy link
Author

I tried Ctrl-q, but that did not appear to have any effect. It's still the same situation: no local echo and no console output in the screen window, but text output on the TFT console works fine (taking its input from what I type in the screen window)

@RetiredWizard
Copy link

RetiredWizard commented Jul 26, 2024

I did stumble on one post that suggested typing ctrl-A followed by a Q on the frozen screen (I think the caps are significant but 🤷)

BTW, I tried reproducing on Linux by minimizing a terminal running tio but no luck...

@todbot
Copy link

todbot commented Jul 26, 2024

Two things I'd try:

  • What happens if you detach your terminal from screen and lock your Mac? (ctrl-a d to detach, then later, screen --list or screen -r to re-attach)
  • What happens if you use tio instead of screen? Maybe it has different flow-control behavior.

@samblenny
Copy link
Author

samblenny commented Jul 26, 2024

When I do a detach-screen, lock-mac, wait, unlock-mac, then reattach-screen sequence, the result is the same (output stopped during the time mac was locked, no further output shows in screen window, but typing into screen still works as expected). I don't have tio on my mac, but I'll see if I can find some other OS and/or terminal emulator combination that reproduces the same problem.

@samblenny
Copy link
Author

I tried using the serial monitor of the Mu editor on an old MacBook Air (13-inch Mid 2013) with macOS Big Sur 11.7. The same problem happens (no console output on the mac, but console input works, and TFT console output works). The same problem also happens when using screen on the old MacBook Air.

@samblenny
Copy link
Author

FWIW, seeing that Mu and screen act the same way across significantly different hardware/OS combinations, my bet is that this has to do with an edge case on the CircuitPython board that is triggered by a USB power saving feature or perhaps by a full TX buffer. I'm assuming that the Macs are entering a standby or suspend mode that slows or stops the CPU. Maybe some combination of TinyUSB, ESP-IDF USB Device Stack, and CircuitPython aren't on the same page about what should happen if the host computer does not read USB serial data at its normal rate.

@RetiredWizard
Copy link

I have run your first test script on an esp32-s3 in a "screen" window on linux with the output paused using ctrl-s for more than 30 minutes and a ctrl-q resumes the output without issue. I wonder if the Mac might be doing something weird to the USB data lines when the screen locks (maybe security to thwart Rubber Ducky or similar attacks?)

@samblenny
Copy link
Author

I have run your first test script on an esp32-s3 in a "screen" window on linux with the output paused using ctrl-s for more than 30 minutes...

As long as I don't lock the screen, nothing weird happens for me. But, once the screen has been locked for approximately 100 seconds (sometimes less, sometimes more), something breaks. The screen locking is the trigger.

An important point to note is that I have my systems set so that they can automatically go to sleep when the display is off.

I'm unclear of the details of how Apple implements "automatic sleeping" as they call it in the system settings, but presumably it involves CPU scaling and possibly USB power saving modes. The Linux kernel equivalent of what's likely happening might be something like ramp from Suspend-to-Idle, to Standby, to Suspend-to-RAM, possibly also with some USB power saving stuff too (see System Sleep States and Power Management for USB docs at kernel.org).

If I'm able to come up with a reproduction recipe for Linux, I'll post it here.

@RetiredWizard
Copy link

Okay, I used the Ubuntu Suspend option and reproduced what you're seeing on Linux. The Neopixel continues to blink but the output is frozen in the screen or tio terminal.

@RetiredWizard
Copy link

RetiredWizard commented Jul 26, 2024

This may be relevant https://www.kernel.org/doc/html/v4.13/driver-api/usb/persist.html#:~:text=By%20default%2C%20Linux%20behaves%20according,officially%20correct%E2%80%9D%20thing%20to%20do.

Edit: I just re-read the linked document and I don't think it applies after all.

I was actually able to bring a frozen screen back to life using the following script

#!/bin/bash
echo "2-1.1" > /sys/bus/usb/drivers/usb/unbind
echo "2-1.1" > /sys/bus/usb/drivers/usb/bind

I ran dmesg after plugging the controller in to identify the device ID (2-1.1)

You have to jump through a few hoops to get the script to run on Ubuntu but when it runs it seems to unlock the screen. I tried for a bit to enable the persist option mentioned in the article but the root object 2-1.1 already had it set and I kept getting permission denied when I tried to create the persist file for the serial usb object.

@RetiredWizard
Copy link

One more observation, I don't seem to be able to reproduce the issue using a Feather RP2040 so it may be limited to ESP boards.

@dhalbert
Copy link
Collaborator

If you folks have time, trying some earlier versions of Python would be interesting, since the underlying TinyUSB software undergoes updates fairly frequently.

@samblenny
Copy link
Author

9.0.0 has the same problem on:

Adafruit CircuitPython 9.0.0 on 2024-03-19; Adafruit QT Py ESP32-S3 no psram with ESP32S3

@samblenny
Copy link
Author

8.1.0 also has the problem on:

Adafruit CircuitPython 8.1.0 on 2023-05-22; Adafruit QT Py ESP32-S3 no psram with ESP32S3

@tannewt
Copy link
Member

tannewt commented Jul 26, 2024

I suspect tud_cdc_connected() is returning false here:

if (tud_cdc_connected()) {
while (count < length) {
count += tud_cdc_write(text + count, length - count);
// If we're in an interrupt, then don't wait for more room. Queue up what we can.
if (cpu_interrupt_active()) {
break;
}
usb_background();
}
}

This is based on the sent line state:
https://github.com/hathach/tinyusb/blob/cfbdc44a8d099240ad5ef208bd639487c2f28153/src/class/cdc/cdc_device.c#L127-L130

I don't know why this would only break on an S3. ESP does run tinyusb processing in a different task. Most run it in the CircuitPython background tasks.

@samblenny
Copy link
Author

samblenny commented Jul 26, 2024

So far, I don't think we know anything one way or the other about ESP32-S2. The ESP32 boards I have are all S3. If anybody wants to try S2, I'd be curious to hear what happens.

These are some other boards I tried, none of which have the problem (timestamps continue printing after screen is unlocked):

Adafruit CircuitPython 9.0.5 on 2024-05-22; Raspberry Pi Pico with rp2040
Adafruit CircuitPython 9.0.5 on 2024-05-22; Adafruit Trinket M0 with samd21e18
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit Feather M4 Express with samd51j19

Also, this is an even simpler reproducer that you can run from the REPL:

from time import monotonic, sleep
a = monotonic()
while True:
    print(monotonic() - a)
    sleep(1)

fail condition: timestamps stop printing after 2-3 minutes of locked screen with automatic sleeping enabled (mac) or suspend-mode (Ubuntu)

@RetiredWizard
Copy link

I reproduced this using an ESP32-S2 board.

@RetiredWizard
Copy link

I can't seem to cause the issue using a QTPY ESP32-pico

@dhalbert
Copy link
Collaborator

I can't seem to cause the issue using a QTPY ESP32-pico

That makes sense because that's strictly a USB-serial converter chip, no native USB involved: TinyUSB and its associated code like the tud_cdc_connected() call above are not used.

@dhalbert dhalbert removed the esp32-s3 label Jul 26, 2024
@dhalbert dhalbert added the espressif applies to multiple Espressif chips label Jul 26, 2024
@RetiredWizard
Copy link

RetiredWizard commented Jul 26, 2024

Modifying this function to always return "false" seems to prevent the issue from occurring.

https://github.com/hathach/tinyusb/blob/ccc7a36043e055ded1f478a979a303e694123187/src/device/usbd.c#L370-L372

I'm guessing that suggests that either the OS isn't marking the device as no longer suspended or the device isn't recovering properly from the suspended state.

Without this function properly operating the ESP32-S3 board continues to output while the computer is suspended, ie there are no skipped counts when scrolling back through the tio window after the computer comes back from suspended standby and the current count corresponds to the number of actual seconds elapsed during the suspension.

Edit: tud_cdc_connected() is essentially = (tud_mounted() && !tud_suspended())

@samblenny
Copy link
Author

another data point... I was able to prevent the serial glitch on ESP32-S3 with storage.disable_usb_drive(). If I put this in my boot.py, then do a hard reset, the USB serial output does not glitch:

import usb_hid
import storage
usb_hid.disable()
storage.disable_usb_drive()  # CAUTION: THIS WILL HIDE YOUR CIRCUITPY DRIVE!!!

For anybody reading this who isn't already familiar with recovering from a disabled CIRCUITPY drive...

To fix your boot.py so CIRCUITPY comes back: connect to the USB serial console, get a REPL prompt, run this stuff to overwrite the file with disable_usb_drive, then do a hard reset:

f = open('boot.py', 'w')
f.write('import usb_hid\nusb_hid.disable()\n')
f.close()

You can also use safe mode to fix it.

@tannewt
Copy link
Member

tannewt commented Aug 29, 2024

I've debugged this down into TinyUSB. The ESP usb code wasn't detecting a resume correctly. It is fixed in hathach/tinyusb#2784. We'll update CP as soon as that is merged in.

@tannewt tannewt modified the milestones: 9.1.x, 9.2.0 Sep 3, 2024
dhalbert pushed a commit that referenced this issue Sep 3, 2024
TinyUSB didn't turn on the interrupt it needed.

Fixes #9454
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug espressif applies to multiple Espressif chips usb
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants