Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT #23761

Open
rrakso opened this issue Aug 28, 2024 · 35 comments
Labels
problem Something isn't working

Comments

@rrakso
Copy link

rrakso commented Aug 28, 2024

What happened?

My ZigBee network just stopped working. And I got this error message: zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.

EDIT:

It seems that this is the main error message zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.


It is also worth mentioning, that when I'm restarting the Home Assistant, then somehow Z2M after reboot causes HA to reboot again, but this time Z2M is not running. I had to switch autostart of Z2M to false, because in other case I will end in reboot loop.

To fix this all I must do is:
Unplug adapter > reboot Home Assistant > Start Zigbee2MQTT (it will fail, ok, there is no adapter plugged in) > Start Zigbee2MQTT again > wait some time (2 to 4 s) > plug in adapter.
From now on everything is working perfectly!

What did you expect to happen?

I expected that everything should work smoothly as always 😁

How to reproduce it (minimal and precise)

I don't know how to reproduce it. It has happend second time.

Zigbee2MQTT version

1.39.1

Adapter firmware version

7.4.3

Adapter

Sonoff ZBDongle-E with ember

Setup

HA on a Raspberry Pi 3 (rpi3-64)

Debug log

Expand to see
[2024-08-28 09:42:23] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":233,"last_seen":"2024-08-28T09:42:23+02:00","linkquality":200,"maximum_range":5,"minimum_range":0,"presence":false,"radar_sensitivity":4,"target_distance":0}'
[2024-08-28 09:42:24] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=135 Len=26].
[2024-08-28 09:42:24] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=135 Len=26].
[2024-08-28 09:42:24] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":233,"last_seen":"2024-08-28T09:42:24+02:00","linkquality":196,"maximum_range":5,"minimum_range":0,"presence":false,"radar_sensitivity":4,"target_distance":0}'
[2024-08-28 09:42:51] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":233,"last_seen":"2024-08-28T09:42:51+02:00","linkquality":200,"maximum_range":5,"minimum_range":0,"presence":false,"radar_sensitivity":4,"target_distance":0}'
[2024-08-28 09:42:53] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=139 Len=7].
[2024-08-28 09:42:53] error: 	zh:ember: !!! ADAPTER FATAL ERROR reason=ASH_ERROR_TIMEOUTS. !!!
[2024-08-28 09:42:53] info: 	zh:ember: Attempting adapter reset...
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Total frames: RX=2291, TX=2766
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   DATA frames : RX=2237, TX=673
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   DATA bytes  : RX=52490, TX=17515
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Retry frames: RX=41, TX=5
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   ACK frames  : RX=13, TX=2088
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Retry dupes     : RX=41
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=5
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash: Port closed.
[2024-08-28 09:42:53] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-08-28 09:42:53] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-08-28 09:42:53] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-08-28 09:42:54] info: 	zh:ember: ======== Ember Adapter Starting ========
[2024-08-28 09:42:54] info: 	zh:ember:ezsp: ======== EZSP starting ========
[2024-08-28 09:42:54] info: 	zh:ember:uart:ash: ======== ASH Adapter reset ========
[2024-08-28 09:42:55] info: 	zh:ember:uart:ash: RTS/CTS config is off, enabling software flow control.
[2024-08-28 09:42:55] info: 	zh:ember:uart:ash: Serial port opened
[2024-08-28 09:42:55] info: 	zh:ember:uart:ash: ======== ASH starting ========
[2024-08-28 09:42:56] info: 	zh:ember:uart:ash: ======== ASH connected ========
[2024-08-28 09:42:56] info: 	zh:ember:uart:ash: ======== ASH started ========
[2024-08-28 09:42:56] info: 	zh:ember:ezsp: ======== EZSP started ========
[2024-08-28 09:42:57] info: 	zh:ember: Adapter EZSP protocol version (13) lower than Host. Switched.
[2024-08-28 09:42:57] info: 	zh:ember: Adapter version info: {"ezsp":13,"revision":"7.4.3 [GA]","build":0,"major":7,"minor":4,"patch":3,"special":0,"type":170}
[2024-08-28 09:42:57] info: 	zh:ember: [STACK STATUS] Network up.
[2024-08-28 09:42:57] info: 	zh:ember: [INIT TC] Adapter network matches config.
[2024-08-28 09:42:57] info: 	zh:ember: [CONCENTRATOR] Started source route discovery. 1248ms until next broadcast.
[2024-08-28 09:42:58] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":233,"last_seen":"2024-08-28T09:42:58+02:00","linkquality":136,"maximum_range":5,"minimum_range":0,"presence":false,"radar_sensitivity":4,"target_distance":0}'
[2024-08-28 09:43:25] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":233,"last_seen":"2024-08-28T09:43:25+02:00","linkquality":200,"maximum_range":5,"minimum_range":0,"presence":false,"radar_sensitivity":4,"target_distance":0}'
[2024-08-28 09:43:25] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":233,"last_seen":"2024-08-28T09:43:25+02:00","linkquality":200,"maximum_range":5,"minimum_range":0,"presence":false,"radar_sensitivity":4,"target_distance":0}'
[2024-08-28 09:43:26] error: 	zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-28 09:43:26] error: 	zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-28 09:43:26] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-28 09:43:27] error: 	zh:ember: !!! ADAPTER FATAL ERROR reason=HOST_FATAL_ERROR. !!!
[2024-08-28 09:43:27] info: 	zh:ember: Attempting adapter reset...
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Total frames: RX=97, TX=109
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   DATA frames : RX=77, TX=42
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   DATA bytes  : RX=1512, TX=569
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Retry frames: RX=17, TX=3
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   ACK frames  : RX=1, TX=63
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Retry dupes     : RX=17
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-08-28 09:43:27] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=3
[2024-08-28 09:43:27] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-28 09:43:29] error: 	zh:ember:uart:ash: ASH disconnected: ASH_ERROR_RESET_FAIL | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-28 09:43:29] error: 	zh:ember:uart:ash: ASH disconnected: ASH_ERROR_RESET_FAIL | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-28 09:43:33] info: 	zh:ember:uart:ash: Port closed.
[2024-08-28 09:43:35] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-08-28 09:43:36] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-08-28 09:43:36] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-08-28 09:43:37] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=0:"VERSION" Seq=0 Len=0].
[2024-08-28 09:43:37] error: 	zh:ember: !!! ADAPTER FATAL ERROR reason=ASH_ERROR_TIMEOUTS. !!!
[2024-08-28 09:43:37] info: 	zh:ember: Attempting adapter reset...
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Total frames: RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   DATA frames : RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   DATA bytes  : RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Retry frames: RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   ACK frames  : RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Retry dupes     : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=0
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-08-28 09:43:37] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-08-28 09:43:37] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-08-28 09:43:37] info: 	zh:ember: ======== Ember Adapter Starting ========
[2024-08-28 09:43:37] info: 	zh:ember:ezsp: ======== EZSP starting ========
[2024-08-28 09:43:37] info: 	zh:ember:uart:ash: ======== ASH Adapter reset ========
[2024-08-28 09:43:38] info: 	zh:ember:uart:ash: RTS/CTS config is off, enabling software flow control.
[2024-08-28 09:43:38] info: 	zh:ember:uart:ash: Serial port opened
[2024-08-28 09:43:38] info: 	zh:ember:uart:ash: ======== ASH starting ========
[2024-08-28 09:43:38] info: 	zh:ember: ======== Ember Adapter Starting ========
[2024-08-28 09:43:38] info: 	zh:ember:ezsp: ======== EZSP starting ========
[2024-08-28 09:43:38] info: 	zh:ember:uart:ash: ======== ASH Adapter reset ========
[2024-08-28 09:43:38] info: 	zh:ember:uart:ash: ======== ASH starting ========
[2024-08-28 09:43:39] info: 	zh:ember:uart:ash: ======== ASH connected ========
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash: ======== ASH started ========
[2024-08-28 09:43:40] info: 	zh:ember:ezsp: ======== EZSP started ========
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash: ======== ASH started ========
[2024-08-28 09:43:40] info: 	zh:ember:ezsp: ======== EZSP started ========
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] info: 	zh:ember: Adapter EZSP protocol version (13) lower than Host. Switched.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] info: 	zh:ember: Adapter EZSP protocol version (13) lower than Host. Switched.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] info: 	zh:ember: Adapter version info: {"ezsp":13,"revision":"7.4.3 [GA]","build":0,"major":7,"minor":4,"patch":3,"special":0,"type":170}
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] info: 	zh:ember: Adapter version info: {"ezsp":13,"revision":"7.4.3 [GA]","build":0,"major":7,"minor":4,"patch":3,"special":0,"type":170}
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available.
[2024-08-28 09:43:40] error: 	zh:ember: Failed to reset and init adapter. Error: Failed to register endpoint '1' with status=ZIGBEE_EZSP_ERROR.
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Total frames: RX=37, TX=75
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   DATA frames : RX=36, TX=37
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   DATA bytes  : RX=246, TX=394
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Retry frames: RX=0, TX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   ACK frames  : RX=0, TX=37
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Retry dupes     : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=0
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash: Port closed.
[2024-08-28 09:43:40] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-08-28 09:43:40] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-08-28 09:43:41] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-08-28 09:43:41] error: 	z2m: Adapter disconnected, stopping
[2024-08-28 09:43:41] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
[2024-08-28 09:43:41] info: 	z2m: Disconnecting from MQTT server
[2024-08-28 09:43:41] info: 	z2m: Stopping zigbee-herdsman...
[2024-08-28 09:43:41] info: 	z2m: Stopped zigbee-herdsman
[2024-08-28 09:43:41] info: 	z2m: Stopped Zigbee2MQTT
@rrakso rrakso added the problem Something isn't working label Aug 28, 2024
@rrakso
Copy link
Author

rrakso commented Aug 29, 2024

It has happened today twice :c

I see some of

[2024-08-29 20:57:01] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=83:"SET_CONFIGURATION_VALUE" Seq=12 Len=6].

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 30, 2024

You are getting different errors here, so, at first glance, it would point to the adapter being unstable. Did you look into removing possible sources of interference for the USB? (I see you are on a Pi, so, powered USB hub, extension cable, etc...)
Since you are also getting troubles with HA<>Z2M, make sure your Pi isn't having issues (check system logs, power supply, hardware, etc).

Note: September 1st release should also help on some edge-cases interference issues (2.4GHz ones).

@rrakso
Copy link
Author

rrakso commented Aug 30, 2024

Thank you very much @Nerivec! I will make sure that power supply isn't the problem - I need to find somewhere a 5V 3A power supply, currently I'm using strong phone charger.

The dongle is connected directly to the RPi. Also I'll be waiting for the September 1st release to check it will help somehow :D Thank you so much for your reply @Nerivec.

PS
For me these error messages like ASH_ERROR / ERROR_WRONG_DIRECTION says nothing :/ I mean I don't know how to fix them on my own. This is why I appreciate your help so much.

PPS
The RPi + Zigbee coordinator are very close to my router 2.4GHz/5GHz router.

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 30, 2024

Interference can result is various errors that don't actually have any meaning because they are randomly triggered. Once this is improved/fixed, then we can see what errors remain, if any.

The Pi is known for causing USB troubles. Make sure to use a USB extension cord to connect the adapter to it, so you can place it farther away from the Pi. A powered USB hub (a USB hub with it's own power supply, so it doesn't draw on the Pi) also fixed several problems for other users in the past.

As for your router, if you haven't already, make sure your 2.4GHz WiFi and your ZigBee use very different channels. Usually, channel 20 or 25 for ZigBee is your best bet. You can also check channels usage around you with ember-zli.

zigbee-wifi-channels
Note: ember supports changing channel, so you shouldn't need to re-pair many devices (only a few capricious ones, if any).

@rrakso
Copy link
Author

rrakso commented Sep 3, 2024

Now I'm using the v1.40.0-1 of zigbee2mqtt and the problem has happened again... :/.

In a period of 2 hours (before the fatal error occurred) I got five times error: zh:ember:ezsp: ERROR transaction failure; status=ERROR_WRONG_DIRECTION. Last frame: [FRAME: ID=52:"SEND_UNICAST" Seq=<number> Len=<number>].

And these were the last errors:

[2024-09-03 10:09:28] error: zh:ember:uart:ash: Received ERROR from adapter with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-09-03 10:09:30] error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-09-03 10:09:31] error: zh:ember:uart:ash: Error parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-09-03 10:09:32] error: zh:ember: Adapter fatal error: HOST_FATAL_ERROR

Before I try to test the USB cable extension / powered USB hub, perhaps I should reflash my adapter with firmware with a different baud rate? Currently I'm using the 230400 version, should I use the 115200 version? What do you think @Nerivec?

The rest of the log ("duplicates" removed)
[2024-09-03 08:08:43] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":128,"last_seen":"2024-09-03T08:08:43+02:00","linkquality":208,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 08:09:20] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":128,"last_seen":"2024-09-03T08:09:20+02:00","linkquality":208,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 08:09:20] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=220 Len=19].
[2024-09-03 08:09:20] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=220 Len=19].
[2024-09-03 08:09:20] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":128,"last_seen":"2024-09-03T08:09:20+02:00","linkquality":208,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:21] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:21+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:22] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:22+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:22] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=11 Len=26].
[2024-09-03 10:02:22] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=11 Len=26].
[2024-09-03 10:02:22] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:22+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:22] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:22+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:55] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:55+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:55] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:55+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:55] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=33 Len=16].
[2024-09-03 10:02:55] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=33 Len=16].
[2024-09-03 10:02:55] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:55+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:02:55] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:02:55+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:03:19] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:03:19+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:03:19] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=49 Len=26].
[2024-09-03 10:03:19] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=49 Len=26].
[2024-09-03 10:03:20] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:03:20+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:07:19] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:07:19+02:00","linkquality":216,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:07:19] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:07:19+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:07:19] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=210 Len=26].
[2024-09-03 10:07:19] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=210 Len=26].
[2024-09-03 10:07:19] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:07:19+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:17] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:17+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:17] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:17+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:17] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:17+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:17] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=33 Len=9].
[2024-09-03 10:09:19] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_WRONG_DIRECTION. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=33 Len=9].
[2024-09-03 10:09:28] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:27+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:28] error: 	zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-09-03 10:09:30] error: 	zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-09-03 10:09:31] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-09-03 10:09:32] error: 	zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Total frames: RX=3450, TX=4125
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   DATA frames : RX=3327, TX=1017
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   DATA bytes  : RX=77404, TX=26462
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Retry frames: RX=97, TX=9
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   ACK frames  : RX=25, TX=3099
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Retry dupes     : RX=97
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-09-03 10:09:32] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=9
[2024-09-03 10:09:36] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:34+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:38] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:37+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:39] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:39+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:39] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":220,"last_seen":"2024-09-03T10:09:39+02:00","linkquality":212,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":2,"target_distance":0}'
[2024-09-03 10:09:40] info: 	zh:ember:uart:ash: Port closed.
[2024-09-03 10:09:40] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-09-03 10:09:40] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-09-03 10:09:40] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-09-03 10:09:40] error: 	z2m: Adapter disconnected, stopping
[2024-09-03 10:09:40] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
[2024-09-03 10:09:40] info: 	z2m: Disconnecting from MQTT server
[2024-09-03 10:09:40] info: 	z2m: Stopping zigbee-herdsman...
[2024-09-03 10:09:40] info: 	z2m: Stopped zigbee-herdsman
[2024-09-03 10:09:40] info: 	z2m: Stopped Zigbee2MQTT

@rrakso rrakso changed the title zh:ember:ezsp: Found no buffer in queue but ASH layer sent signal that one was available. zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT Sep 3, 2024
@Nerivec
Copy link
Collaborator

Nerivec commented Sep 3, 2024

I don't think the baudrate would create this particular issue, but 115200 is definitely more tested than any other.
For ERROR_WRONG_DIRECTION, there is more info/tests going on in #22863

@rrakso
Copy link
Author

rrakso commented Sep 4, 2024

@Nerivec could you please tell me what these errors mean? Maybe I can find out how to fix this too, haha

[2024-09-04 19:38:42] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_TRUNCATED. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=108 Len=7].
[2024-09-04 19:38:42] error: 	zh:ember: Adapter fatal error: ERROR_TRUNCATED

After these errors the zigbee2mqtt stopped working.

Now I am using a better power supply with a better cable (but I still don't have the you know the strong 5V 3A real one - but it worked so long with the old type :c really strange), but the baud is still 230400. Today I will reflash the dongle to 115200 to see if that helps.

The rest of the log
[2024-09-04 19:38:41] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/sensor', payload '{"detection_delay":0,"fading_time":2,"illuminance_lux":15,"last_seen":"2024-09-04T19:38:41+02:00","linkquality":176,"maximum_range":4,"minimum_range":0,"presence":false,"radar_sensitivity":3,"target_distance":0}'
[2024-09-04 19:38:42] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ERROR_TRUNCATED. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=108 Len=7].
[2024-09-04 19:38:42] error: 	zh:ember: Adapter fatal error: ERROR_TRUNCATED
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Total frames: RX=9429, TX=11492
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   DATA frames : RX=9155, TX=2843
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   DATA bytes  : RX=211252, TX=73968
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Retry frames: RX=226, TX=34
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   ACK frames  : RX=48, TX=8615
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Retry dupes     : RX=226
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=34
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash: Port closed.
[2024-09-04 19:38:42] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-09-04 19:38:42] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-09-04 19:38:42] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-09-04 19:38:42] error: 	z2m: Adapter disconnected, stopping
[2024-09-04 19:38:43] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
[2024-09-04 19:38:43] info: 	z2m: Disconnecting from MQTT server
[2024-09-04 19:38:43] info: 	z2m: Stopping zigbee-herdsman...
[2024-09-04 19:38:43] info: 	z2m: Stopped zigbee-herdsman
[2024-09-04 19:38:43] info: 	z2m: Stopped Zigbee2MQTT

PS

Regarding #22863 - I have one of these 😅

@Nerivec
Copy link
Collaborator

Nerivec commented Sep 5, 2024

    /** The direction flag in the frame control field was incorrect. */
    ERROR_WRONG_DIRECTION = 0x32,
    /**
     * The truncated flag in the frame control field was set, indicating there was not enough memory available to
     * complete the response or that the response would have exceeded the maximum EZSP frame length.
     */
    ERROR_TRUNCATED = 0x33,
    /**
     * The overflow flag in the frame control field was set, indicating one or more callbacks occurred since the previous
     * response and there was not enough memory available to report them to the Host.
     */
    ERROR_OVERFLOW = 0x34,

From https://github.com/Koenkk/zigbee-herdsman/blob/master/src/adapter/ember/enums.ts#L755

But as mentioned before, with the varying errors, it looks like interference, which may result in errors that would not necessarily mean anything (except something interfered 😛).

I have one of these

Try to temporarily remove that device from the network if you can, see if it works better after that. That device creates some strange behaviors for sure.

@rrakso
Copy link
Author

rrakso commented Sep 12, 2024

So after a week of testing I have unplugged the ZY-M100-S_2 and everything seems to work pretty good!

I still have the problem that I have to disconnect and reconnect the adapter after rebooting the Raspberry Pi1, but I was able to switch to the previously used power supply two days ago and everything still works fine!

In the next two weeks I plan to temporarily run my HA with Z2M on an old laptop (with a better processor than RPI, and with 4GB of RAM instead of the 1GB it currently has). And then I will try again to use the occupancy sensor.

Footnotes

  1. When I try to start Z2M after the RPI reboot, all I see is the addon starting, some logs: [10:37:16] INFO: Preparing to start... [10:37:16] INFO: Socat not enabled [10:37:16] INFO: Starting Zigbee2MQTT... Starting Zigbee2MQTT without watchdog. and then I got an error message from HA saying "connection lost, trying to reconnect". (The HA reboots? Can I see logs before the reboot crash?

@Nerivec
Copy link
Collaborator

Nerivec commented Oct 1, 2024

Quick note, not sure what step you are on, but make sure to update to 7.4.4 too, it seems to have fixed several problems for a few users.

@mafnaa
Copy link

mafnaa commented Oct 27, 2024

I was getting code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT every time during firmware update of Sonoff's TRVZB valve. Tried multiple firmwares with same negative result until finally was able to complete update on 7.4.4, specifically this one https://github.com/darkxst/silabs-firmware-builder/blob/main/firmware_builds/zbdonglee/ncp-uart-hw-v7.4.4.0-zbdonglee-115200.gbl

@ortofan
Copy link

ortofan commented Dec 5, 2024

I get the following error everytime I restart or start z2m.

zh:ember:uart:ash: Received ERROR from adapter while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.

If I rebbot the coordinators before starting the z2m instance no error.

Running latest firmware, latest z2m on 3 separate instances. there are 0 interference issues all 3 installs are SLZB06M.

@szwacz
Copy link

szwacz commented Dec 6, 2024

Stumbling upon the same issue all the time and troubleshooting. When this error happens the only way to put network back online is to restart Home Assistant, restarting Z2M just gives me same error instantly.
So far did:

  • Upgraded to coordinator SLZB06M, flashed with 7.4.5 firmware.
  • Removed from network one device that is not on the official support list of Z2M.

Fortunately I have two locations with completely same zigbee configuration and after recent changes on one of them the error happens once every 1-2 days. And the other location is stable for 4 days already. So maybe some externat interferences are in play. Will report about new developments in my networks.

EDIT: ok one mistake about sameness of the 2 locations setup. In the "stable" location I had SLZB06M coordinator firmware from .dev branch and in the unstable it was v2.3.6. Upgraded to latest v2.6.8.dev16 and will see.

@microneer
Copy link

For me, this was solved by changing my WiFi channel.

I was having lots of ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT errors, which were causing Zigbee2MQTT to restart. I could reproduce it almost always by recreating the map.

I found i had the Ember coordinator on Zigbee channel 25 and my 2.4GHz WiFi on Channel 11. These channels overlap according to the diagram above. Moving my WiFi to channel 4 mostly solved the problem. Significantly less frequent errors and restarts.

However, what I'm unclear on, @Nerivec , is why a timeout should require Zigbee2MQTT to restart? There are people on HA's forum having problems too: https://community.home-assistant.io/t/error-messages-trying-to-ota-update-devices-using-z2m-on-a-ha-yellow/788913/2

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 7, 2024

Whenever an ERROR frame is received from the adapter, it means the adapter has entered a "failed state", after which, the only recourse is to restart the stack. In a lot of cases, it's only about keeping proper synchronization between the adapter and the application.
About restarting only the adapter internally, or restarting Z2M completely, both are pretty much equivalent, since the adapter restart makes for most of the Z2M restart, and it avoids many potential issues with higher-levels of Z2M, tl;dr it's much cleaner.

As for the error itself, most of the reports here look like interference indeed (you may not realize how bad it can get in some situations, and not necessarily from your environment, but neighbors too; the SLZB06M also seems particularly susceptible), but the adapter should handle this better nonetheless. I suspect a firmware issue when the radio comes under stress, but we'll have to wait for silabs on that one...

@szwacz
Copy link

szwacz commented Dec 9, 2024

@Nerivec so restart of just Z2M zhould be enough to bring network back online? Why this issue is so annoying for me is that Home Assistant restarts Z2M plugin, but it just keeps crashing in a loop with same error, what exhausts watchdog restart counter and the network fails permanently. Manual restarts of Z2M by me, after few hours when I realized the network is down, still leads to the error. The only way is to restart Home Assistant as a whole. This is the strange part for me.

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 9, 2024

@szwacz And that happens with SLZB06M (TCP-based)? If it were with a USB device, I'd say something goes wrong with the USB on HA, but I assume the TCP adapter is not connected directly to the HA machine?
You are not using the WiFi option in the SLZB06M, correct? (Because that's definitely not stable in itself.)

@szwacz
Copy link

szwacz commented Dec 10, 2024

@Nerivec you are correct, I use it via ethernet cable.
After firmware updates and OTA updates the network seems to be more stable and doesn't fail completely (so far). That being said I still get ERROR_WRONG_DIRECTION for some reason and my network consists of just 3 types of devices all of which are on the "supported devices" list. May it start the cascade that ends in ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT?

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 10, 2024

NOUS devices are known to cause ERROR_WRONG_DIRECTION for some unknown reason, do you have any?
This happens when Z2M receives a command from the adapter (it's only supposed to receive responses). Since it appears to not create issues (it's only a passive detection mechanism, it does not create an "error state") it was downgraded from an event that previously forced the adapter to restart (per silabs), to an event that is just logged (previous implementations, including current ZHA, don't even have this check at all). It shouldn't have a link with the other error, except if both are caused by interferences, then the specifics of the error become irrelevant, it only means "something is affecting the flow of information, the adapter is not happy".

It is strange that it requires a full HA restart to fix it though.

Since you upgraded your network, I assume you have the old adapter laying around somewhere? Can you use it with Ember ZLI to scan for networks + channels, to see what the environment is like around you for reference (preferably closer to the new adapter)? https://github.com/Nerivec/ember-zli/wiki/Stack#scan-network

Can you also take a look at the CPU usage on your machine around the time these errors happen?

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 10, 2024

@ortofan what core firmware version do you have on the SLZB06M? Versions around 2.5.6 are known to have a bug that slows down the ESP side. That is reflected in the Zigbee side with timeouts.

@ortofan
Copy link

ortofan commented Dec 10, 2024

v2.6.8.dev16 and Zigbee:
Dev firmware
Revision: 20241127
For 'adapter: ember' adapter settings in Z2M!
SDK 7.4.5

FWIW I have 5-8 NOUS A1Z Smart plugs on all my networks (I run 3 z2m instances all with the same coordinator and firmware). This is an invaluble plug for me becouse it has very unique feature. Over voltage protection. I frequntly have over 260v AC Mains on the phase of the house where large number of solar installations are present in my hood. INSANE I know but the power company wont fix it for a while. Till then NOUS and a few UPS saves me important gear as it turns off instantly when over 258v.

@szwacz
Copy link

szwacz commented Dec 10, 2024

@Nerivec Thank you for that explanation. Indeed I have NOUS A1Z in my network. So this issue should be separate of the one here.

Thankfully failure of the network just happenedn, so I had the chance to observe it better. The hunch about CPU usage is intresting one, because I use Raspberry PI 3B+.

First attaching logs of when the fail happened.
network fail happened.log
(sorry that log files are not trimmed, but I wanted to present the whole vicious loop).

I've discovered it after half an hour, and clicked just "start" in the Z2M addon options again.
Here is log of that (we are still in the vicious loop).
z2m restart.log

This is memory and cpu usage recorded during events from the log file above.
Screenshot 2024-12-10 at 17 36 18
Memory too high?
CPU doesn't seem to be that affected, but when I was clicking through home assistant UI it felt more and more laggy with every next restart. Very interesting.

After restarting whole HA everything went back to normal. Indeed maybe I should try upgrading my raspberry....

@CloudMass
Copy link

CloudMass commented Dec 10, 2024

This happens to me as well. Once or twice a day my Zigbee network crashes.

As far as I can see it often when it's under "high" load (As in a lot of devices are asked to do things at the same time). I can also reproduce the issue by trying to push an update to one of my Hue lamps. It will fail because of the crash.

Adapter: Sonoff ZBDongle-E
Zigbee2MQTT version: 1.42.0
Coordinator type: EmberZNet
Coordinator revision: 8.0.2
Zigbee channel: 20
WiFi channel: 1

Logs
[2024-12-10 17:58:42] info: 	z2m: Update of 'Hallway - Ceiling' at 50.56%, ≈ 0 minutes remaining
[2024-12-10 17:58:42] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Hallway - Ceiling', payload '{"brightness":79,"color_mode":"color_temp","color_temp":406,"linkquality":164,"power_on_behavior":null,"state":"OFF","update":{"installed_version":16786182,"latest_version":16786434,"progress":50.56,"remaining":30,"state":"updating"},"update_available":null}'
[2024-12-10 17:59:03] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Kitchen - Motion', payload '{"battery":100,"device_temperature":23,"illuminance":8,"illuminance_lux":8,"linkquality":255,"occupancy":true,"power_outage_count":65,"voltage":3065}'
[2024-12-10 17:59:03] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Kitchen - Motion', payload '{"battery":100,"device_temperature":23,"illuminance":8,"illuminance_lux":8,"linkquality":255,"occupancy":true,"power_outage_count":65,"voltage":3065}'
[2024-12-10 17:59:09] error: 	zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-12-10 17:59:09] error: 	zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-12-10 17:59:09] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-12-10 17:59:09] error: 	zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Total frames: RX=1494, TX=1981
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   DATA frames : RX=1486, TX=484
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   DATA bytes  : RX=43213, TX=33526
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Retry frames: RX=6, TX=3
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   ACK frames  : RX=0, TX=1493
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Retry dupes     : RX=6
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-12-10 17:59:09] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=3
[2024-12-10 17:59:17] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=226 Len=10].
[2024-12-10 17:59:17] info: 	z2m: Update of 'Hallway - Ceiling' at 52.08%, ≈ 1 minutes remaining
[2024-12-10 17:59:17] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Hallway - Ceiling', payload '{"brightness":79,"color_mode":"color_temp","color_temp":406,"linkquality":168,"power_on_behavior":null,"state":"OFF","update":{"installed_version":16786182,"latest_version":16786434,"progress":52.08,"remaining":60,"state":"updating"},"update_available":null}'
[2024-12-10 17:59:40] info: 	zh:ember:uart:ash: Port closed.
[2024-12-10 17:59:40] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-12-10 17:59:40] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-12-10 17:59:40] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-12-10 17:59:40] error: 	z2m: Adapter disconnected, stopping
[2024-12-10 17:59:40] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
[2024-12-10 17:59:40] info: 	z2m: Disconnecting from MQTT server
[2024-12-10 17:59:40] info: 	z2m: Stopping zigbee-herdsman...
[2024-12-10 17:59:40] info: 	z2m: Stopped zigbee-herdsman
[2024-12-10 17:59:40] info: 	z2m: Stopped Zigbee2MQTT

@szwacz
Copy link

szwacz commented Dec 11, 2024

@Nerivec I also did channel scan as you asked (man this is nice tool! :) )

Channel 11: •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-29 dBm]
Channel 12: ••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-40 dBm]
Channel 13: ••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-40 dBm]
Channel 14: •••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-51 dBm]
Channel 15: •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-21 dBm]
Channel 16: •••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-35 dBm]
Channel 17: ••••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-34 dBm]
Channel 18: •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-29 dBm]
Channel 19: ••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-36 dBm]
Channel 20: ••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-54 dBm]
Channel 21: •••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-51 dBm]
Channel 22: ••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-46 dBm]
Channel 23: ••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-42 dBm]
Channel 24: •••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-35 dBm]
Channel 25: •••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-55 dBm]
Channel 26: •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ [-17 dBm]

My network is on channel 26 (moved there few weeks ago out of desperation).

Didn't detect different zigbee networks than mine.

I also ordered more powerful raspberry and will try run HA with more RAM.

@Kuchiru
Copy link

Kuchiru commented Dec 16, 2024

Im getting this error daily now, switched from a slae.sh stick to the 06M, using channel 25, my own wifi networks are on 1 and 6. Some Neighbors are on 11 but it is not particularly crowded here and these houses are well isolated.

Unfortunately im at a loss on what i can do to improve this, running the latest firmware on the 06M as well.

@szwacz
Copy link

szwacz commented Dec 16, 2024

Ok, I can report major improvements. Upgraded from RaspberryPI 3B+ 1GB RAM, to RaspberryPI 4B 4GB RAM.
I saved last 10 000 lines of logs before upgrade and waited 24h for new logs. Did string-search for erros in both files and here are the results.

On old setup (pi3):
ERROR_WRONG_DIRECTION - 130 occurances
ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT - 19 occurances
Stopped Zigbee2MQTT - 18 occurances

New setup (pi4):
ERROR_WRONG_DIRECTION - 0 occurances (for some reason this error vanished as well?)
ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT - 3 occurances
Stopped Zigbee2MQTT - 0 occurances

This is night and day. I don't know if processing power or RAM was the issue, but since I had two identical installations and after upgrade both improved this way I can pretty safely say that if you use Home Assistant with Zigbe2MQTT aim at Pi4 with at least 2 gigs of RAM.

The ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT still occures sometimes, but is completely managable now.

@microneer
Copy link

@szwacz this is very good to know. I have the same RPi Model 3B+ 1GB hardware and ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT errors every couple of minutes, leading to Zigbee2MQTT restarts.

It's also running openHAB 4.2, which has become unresponsive for sometimes a minute at a time since installing Z2M and connecting a handful of very talkative routers. Like you, I figured I needed to upgrade.

I considered an RPi 4B or 5B, but for just over half the money I've bought an off-lease HP EliteDesk 800 G3 Mini with 16GB RAM and 256 NVMe. Based on your investigation, I think my hunch that this will at least improve the timeouts might pay off!

@Kuchiru
Copy link

Kuchiru commented Dec 18, 2024

With this in mind i increased the resources available to my z2m VM, it sits in a low resource alpine container.
It had 1 core and 256mb ram which is more than enough according to the resource monitor.

i increased it to 2 cores and 1GB ram, however it still crashed about a day later just like before.

what i’ve done now is increased the ACK timeout delay to 2000, disabled OTA for some devices (these were a bit spammy) and made some changes in the tcp keep alive options in my vm.
As of right now it has exceeded a day in uptime so i am carefully optimistic.

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 18, 2024

Just a sidenote, but from HA specs sheet, they seem to recommended Pi 4 or above. That would be for home assistant alone, you add add-ons and integrations, every time you increase the requirements a little (not so little for anything AI related).
openHAB being Java based, I'd expect the requirements to be higher still.

Z2M alone should use around 100MB of RAM (likely ~70 in most cases), and a small amount of CPU. It's not quite clear yet what could be essentially resulting in serial interference when the resources become low, but we've had enough reports to confirm it appears to be (HA). Some work is being done to reduce the requirements of Z2M a bit further, but it is already pretty low, so in most cases, the other "resource takers" should be looked at more closely (or the hardware specs indeed).

@Kuchiru can you expand on the exact changes you applied to ACK & TCP? (If you have some before/after logs, that would be great too!) Also, are you running the original core firmware on the 06m, or a esphome firmware?

@Kuchiru
Copy link

Kuchiru commented Dec 18, 2024

Hi @Nerivec,

Im running 2.6.8 dev21 for both logs with the latest dev firmware available for zigbee, 202411xx i believe.

Changes in sysctl for tcp keepalive:

net.ipv4.tcp_keepalive_time = 60      # 1 minute of idle time
net.ipv4.tcp_keepalive_intvl = 10     # 10-second interval between probes
net.ipv4.tcp_keepalive_probes = 5     # 5 probes before declaring dead

I've attached the logs as well, for the before log i had debug enabled in the hopes of catching something extra, unfortunately i did not.
Not sure if it matters if im using ember drivers but worth a mention.

after.log
before.log

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 18, 2024

Changes in Z2M config (ACK):

That setting does not exist (i.e. this does nothing).

There is a rather large spike in "stale neighbor" around midnight in your after log (https://nerivec.github.io/z2m-ember-helper/).
NEIGHBOR_STALE: A neighbor table entry became stale because it had not been heard from.
Do you see any reason that might be (device going offline for an extended period or changing routes because of low LQI, that sort of thing)?

@Kuchiru
Copy link

Kuchiru commented Dec 18, 2024

Changes in Z2M config (ACK):

That setting does not exist (i.e. this does nothing).

I'll take a look at the logs. Thanks 😉

Good to know, I’ll remove it!

There is a rather large spike in "stale neighbor" around midnight in your after log (https://nerivec.github.io/z2m-ember-helper/).
NEIGHBOR_STALE: A neighbor table entry became stale because it had not been heard from.
Do you see any reason that might be (device going offline for an extended period or changing routes because of low LQI, that sort of thing)?

I've looked through the log but did not find any mention of NEIGHBOR_STALE, nothing is set specifically to change after midnight though so im not sure what is causing this. did you see any specific devices that were stale?

@szwacz
Copy link

szwacz commented Dec 19, 2024

Just a sidenote, but from HA specs sheet, they seem to recommended Pi 4 or above. That would be for home assistant alone, you add add-ons and integrations, every time you increase the requirements a little (not so little for anything AI related). openHAB being Java based, I'd expect the requirements to be higher still.

Yes. I red those guidelines, but I had Pi3 already laying around so just gave it a try. The thing is that everything was working fine. Except once in a while cascading crash due to error discussed in this thread happened. Since all troubleshooting materials about zigbee always focus on wifi or other signal interference and firmware updates this is what I was troubleshooting first few weeks. Nothing was telling me that performance bottleneck might be the culprit.

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 19, 2024

@Kuchiru NEIGHBOR_STALE is reported in the NCP_COUNTERS log line (which is what the tool I linked uses to build statistics). Unfortunately, it is just a number, no associated devices (would need firmware debugging capabilities for that). The number goes up to almost 30 between 11pm-1am, while the rest of the day, it is near 0. I've seen much worse numbers in some networks, but I guess the sudden spike bothered me 😉

@Kuchiru
Copy link

Kuchiru commented Dec 20, 2024

Yeah it bothers me too, housely activity obviously comes to a halt around midnight but all routers remain powered so im not sure whats happening here. Z2M did crash every morning shortly after we wake up, however, since the TCP changes and disabling OTA requests for some devices it has not crashed and has remained up for about 3 days now.

I've had chatgpt analyze my logs to find any offenders that are spamming the network and reconfigured some reporting as well for good measure.

edit: z2m started crashing again, i've created a home assistant automation to restart my z2m instance in proxmox when it disconnects, this works around the issue but is sub-optimal to say the least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants