Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT does not start up properly (sometimes) #115958

Closed
umrath opened this issue Apr 22, 2024 · 63 comments
Closed

MQTT does not start up properly (sometimes) #115958

umrath opened this issue Apr 22, 2024 · 63 comments

Comments

@umrath
Copy link

umrath commented Apr 22, 2024

The problem

When restarting HA (to apply new settings, updates, etc.) I have a fair chance that MQTT is seemingly starting up properly - but missing certain functionalities.
Most prominently, button events are not registered, even though they are clearly visible in Zigbee2MQTT.

The usual workaround: Restarting MQTT until it works again.
Sometimes it works after restarting once, sometimes it need 2 or more restarts until all events are registered again.

In the logfile I cannot find anything remotely related. The events seem to just "vanish" silently.

What version of Home Assistant Core has the issue?

core-2024.4.3

What was the last working version of Home Assistant Core?

core-2024.1 (maybe)

What type of installation are you running?

Home Assistant OS

Integration causing the issue

MQTT

Link to integration documentation on our website

No response

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

nothing I can remotely pin point to the issue

Additional information

This issue is fairly new. It probably started in Release 2024.3 or so.

@home-assistant
Copy link

Hey there @emontnemery, @jbouwh, @bdraco, mind taking a look at this issue as it has been labeled with an integration (mqtt) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of mqtt can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign mqtt Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


mqtt documentation
mqtt source
(message by IssueLinks)

@bdraco
Copy link
Member

bdraco commented Apr 25, 2024

There were some improvements to MQTT startup in 2024.5.0b0

Is there a chance you can try beta?

@umrath
Copy link
Author

umrath commented Apr 26, 2024

This would jeopardize the wife acceptance factor.
So, I rather wait for 2024.5 to drop as stable.

@seblu
Copy link

seblu commented Apr 27, 2024

On my prod setup I have a similar issue since several releases. I tried with 2024.5.0b0 but it's not better.

So, restart HA will result in some entities unavailable (e.g lights, z2m bridge state) and others not (e.g z2m bridge version).

  • Reloading the MQTT (client) integration doesn't help.
  • Restarting the mosquitto broker addon doesn't help.
  • There is no WARNING or ERROR about that in the log.
  • HA sends birth messages (online) on the topic homeassistant/status when it restarts, but no will (offline). Despite it's configured in the mqtt integration configuration.
  • Push offline on homeassistant/status before restart HA doesn't help.
  • Unavailable entity devices MQTT Info tabs are empty. Like if they don't have received the MQTT autodiscovery payload.
  • config are available with mosquitto_sub -h core-mosquitto -v -t "homeassistant/+/+/+/config"

Restart Z2M after HA starts restore missing entities.

@seblu
Copy link

seblu commented Apr 29, 2024

The sending of will messages was fixed in #116319, a few hours after the previous report.
I can confirm they are sent with HA 2024.5.0b2

➜  ~ mosquitto_sub -h core-mosquitto -v -t "homeassistant/status"                 
homeassistant/status online
homeassistant/status offline
homeassistant/status online

Original issue is not fixed. Here is an example with default entities exported by Z2M.

Screenshot_20240429_163301

@Skeletorjus
Copy link

I don't know if this is helpful or not, but a way to "fix" this is to rename the button(s) to something else in Zigbee2MQTT (be sure to select "Update Home Assistant entity ID") and then back again. The action subtype isn't populated in the automation editor before doing this.

@seblu
Copy link

seblu commented Apr 30, 2024

Please note that when all entities are successfully started, restarting the broker does not cause the issue.

I enabled more logging on the moquitto broker and here is outputs focused on a light device soberly named Room3 Light with layer address 0x0017880103367d01. Lights are known to becoming unavailable after HA restart.

MQTT broker logs when HA restart

2024-04-30 11:44:01: Client hass closed its connection.
2024-04-30 11:44:07: New connection from 172.30.32.1:59111 on port 1883.
2024-04-30 11:44:07: New client connected from 172.30.32.1:59111 as hass (p5, c1, k30, u'homeassistant').
2024-04-30 11:44:07: Will message specified (7 bytes) (r1, q0).
2024-04-30 11:44:07: 	homeassistant/status
2024-04-30 11:44:07: Sending CONNACK to hass (0, 0)
2024-04-30 11:44:10: Received SUBSCRIBE from hass
2024-04-30 11:44:10: 	homeassistant/+/+/config (QoS 0)
2024-04-30 11:44:10: hass 0 homeassistant/+/+/config
2024-04-30 11:44:10: 	homeassistant/+/+/+/config (QoS 0)
2024-04-30 11:44:10: hass 0 homeassistant/+/+/+/config
2024-04-30 11:44:10: 	drop_connect/discovery/# (QoS 0)
2024-04-30 11:44:10: hass 0 drop_connect/discovery/#
2024-04-30 11:44:10: 	dsmr/# (QoS 0)
2024-04-30 11:44:10: hass 0 dsmr/#
2024-04-30 11:44:10: 	fully/deviceInfo/+ (QoS 0)
2024-04-30 11:44:10: hass 0 fully/deviceInfo/+
2024-04-30 11:44:10: 	tasmota/discovery/# (QoS 0)
2024-04-30 11:44:10: hass 0 tasmota/discovery/#
2024-04-30 11:44:10: Sending SUBACK to hass
...
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/power_on_behavior/config', ... (847 bytes))
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/linkquality/config', ... (871 bytes))
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/last_seen/config', ... (833 bytes))
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/update_state/config', ... (825 bytes))
...
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/binary_sensor/0x0017880103367d01/update_available/config', ... (882 bytes))
...
2024-04-30 11:44:10: Outgoing messages are being dropped for client hass.
...
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/update/0x0017880103367d01/update/config', ... (1283 bytes))
...
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/select/0x0017880103367d01/effect/config', ... (845 bytes))
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/select/0x0017880103367d01/power_on_behavior/config', ... (949 bytes))
...
2024-04-30 11:44:10: Sending PUBLISH to hass (d0, q0, r1, m0, 'homeassistant/light/0x0017880103367d01/light/config', ... (985 bytes))
...
2024-04-30 11:44:13: Received SUBSCRIBE from hass
...
2024-04-30 11:44:13: Sending SUBACK to hass
...
2024-04-30 11:44:17: Received SUBSCRIBE from hass
...
2024-04-30 11:44:17: Sending SUBACK to hass
...
2024-04-30 11:44:49: Received PUBLISH from z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 11:44:49: Sending PUBLISH to z2m (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 11:44:49: Received PUBLISH from z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 11:44:49: Sending PUBLISH to z2m (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))

Note:

  • HA client doesn't subscribe to the topic zigbee2mqtt/Room3 Light published in homeassistant/light/0x0017880103367d01/light/config at 2024-04-30 11:44:10.
  • There is a line Outgoing messages are being dropped for client hass.

MQTT auto-discovery payload of the light requested from external client

mosquitto_sub -h core-mosquitto -v -t "homeassistant/+/0x0017880103367d01/light/config"       
homeassistant/light/0x0017880103367d01/light/config {"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"},{"topic":"zigbee2mqtt/Room3 Light/availability","value_template":"{{ value_json.state }}"}],"availability_mode":"all","brightness":true,"brightness_scale":254,"color_mode":true,"command_topic":"zigbee2mqtt/Room3 Light/set","device":{"identifiers":["zigbee2mqtt_0x0017880103367d01"],"manufacturer":"Philips","model":"Hue Fair (4034031P7)","name":"Room3 Light","suggested_area":"Room3","sw_version":"1.108.5","via_device":"zigbee2mqtt_bridge_0x00124b0025e1e6ec"},"effect":true,"effect_list":["blink","breathe","okay","channel_change","candle","finish_effect","stop_effect","stop_hue_effect"],"max_mireds":500,"min_mireds":150,"name":null,"object_id":"room3_light","origin":{"name":"Zigbee2MQTT","sw":"1.36.1","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/Room3 Light","supported_color_modes":["color_temp"],"unique_id":"0x0017880103367d01_light_zigbee2mqtt"}

MQTT broker logs when Z2M restart

2024-04-30 12:00:30: Received DISCONNECT from z2m
2024-04-30 12:00:30: Client z2m disconnected.
2024-04-30 12:00:42: New connection from 172.30.33.3:51256 on port 1883.
2024-04-30 12:00:42: New client connected from 172.30.33.3:51256 as z2m (p5, c1, k30, u'addons').
2024-04-30 12:00:42: Will message specified (19 bytes) (r1, q1).
2024-04-30 12:00:42: 	zigbee2mqtt/bridge/state
2024-04-30 12:00:42: Sending CONNACK to z2m (0, 0)
2024-04-30 12:00:42: Received PUBLISH from z2m (d0, q0, r1, m0, 'zigbee2mqtt/bridge/state', ... (18 bytes))
2024-04-30 12:00:42: Sending PUBLISH to hass (d0, q0, r0, m0, 'zigbee2mqtt/bridge/state', ... (18 bytes))
2024-04-30 12:00:42: Received SUBSCRIBE from z2m
2024-04-30 12:00:42: 	zigbee2mqtt/# (QoS 0)
2024-04-30 12:00:42: z2m 0 zigbee2mqtt/#
...
2024-04-30 12:00:42: Sending PUBLISH to z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 12:00:42: Sending PUBLISH to z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light/availability', ... (18 bytes))
...
2024-04-30 12:00:43: Received PUBLISH from z2m (d0, q1, r1, m7217, 'zigbee2mqtt/Room3 Light/availability', ... (18 bytes))
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light/availability', ... (18 bytes))
...
2024-04-30 12:00:43: Received SUBSCRIBE from z2m
2024-04-30 12:00:43: 	homeassistant/status (QoS 0)
2024-04-30 12:00:43: z2m 0 homeassistant/status
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/status', ... (6 bytes))
2024-04-30 12:00:43: Received SUBSCRIBE from z2m
2024-04-30 12:00:43: 	homeassistant/# (QoS 0)
2024-04-30 12:00:43: z2m 0 homeassistant/#
...
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/power_on_behavior/config', ... (847 bytes))
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/linkquality/config', ... (871 bytes))
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/last_seen/config', ... (833 bytes))
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/sensor/0x0017880103367d01/update_state/config', ... (825 bytes))
...
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/binary_sensor/0x0017880103367d01/update_available/config', ... (882 bytes))
...
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/select/0x0017880103367d01/effect/config', ... (845 bytes))
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/select/0x0017880103367d01/power_on_behavior/config', ... (949 bytes))
...
2024-04-30 12:00:43: Sending PUBLISH to z2m (d0, q0, r1, m0, 'homeassistant/light/0x0017880103367d01/light/config', ... (985 bytes))
...
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7898, 'homeassistant/light/0x0017880103367d01/light/config', ... (985 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/light/0x0017880103367d01/light/config', ... (985 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/light/0x0017880103367d01/light/config', ... (985 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7898, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7899, 'homeassistant/sensor/0x0017880103367d01/power_on_behavior/config', ... (847 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/power_on_behavior/config', ... (847 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/power_on_behavior/config', ... (847 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7899, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7900, 'homeassistant/select/0x0017880103367d01/power_on_behavior/config', ... (949 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/select/0x0017880103367d01/power_on_behavior/config', ... (949 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/select/0x0017880103367d01/power_on_behavior/config', ... (949 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7900, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7901, 'homeassistant/select/0x0017880103367d01/effect/config', ... (845 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/select/0x0017880103367d01/effect/config', ... (845 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/select/0x0017880103367d01/effect/config', ... (845 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7901, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7902, 'homeassistant/sensor/0x0017880103367d01/linkquality/config', ... (871 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/linkquality/config', ... (871 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/linkquality/config', ... (871 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7902, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7903, 'homeassistant/sensor/0x0017880103367d01/last_seen/config', ... (833 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/last_seen/config', ... (833 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/last_seen/config', ... (833 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7903, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7904, 'homeassistant/sensor/0x0017880103367d01/update_state/config', ... (825 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/update_state/config', ... (825 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/sensor/0x0017880103367d01/update_state/config', ... (825 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7904, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7905, 'homeassistant/binary_sensor/0x0017880103367d01/update_available/config', ... (882 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/binary_sensor/0x0017880103367d01/update_available/config', ... (882 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/binary_sensor/0x0017880103367d01/update_available/config', ... (882 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7905, rc0)
2024-04-30 12:00:44: Received PUBLISH from z2m (d0, q1, r1, m7906, 'homeassistant/update/0x0017880103367d01/update/config', ... (1283 bytes))
2024-04-30 12:00:44: Sending PUBLISH to hass (d0, q0, r0, m0, 'homeassistant/update/0x0017880103367d01/update/config', ... (1283 bytes))
2024-04-30 12:00:44: Sending PUBLISH to z2m (d0, q0, r0, m0, 'homeassistant/update/0x0017880103367d01/update/config', ... (1283 bytes))
2024-04-30 12:00:44: Sending PUBACK to z2m (m7906, rc0)
...
2024-04-30 12:00:45: Received PUBLISH from z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 12:00:45: Sending PUBLISH to z2m (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
...
2024-04-30 12:00:46: Received SUBSCRIBE from hass
...
2024-04-30 12:00:46: 	zigbee2mqtt/Room3 Light/availability (QoS 0)
2024-04-30 12:00:46: hass 0 zigbee2mqtt/Room3 Light/availability
2024-04-30 12:00:46: 	zigbee2mqtt/Room3 Light (QoS 0)
2024-04-30 12:00:46: hass 0 zigbee2mqtt/Room3 Light
...
2024-04-30 12:00:46: Sending PUBLISH to hass (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light/availability', ... (18 bytes))
2024-04-30 12:00:46: Sending PUBLISH to hass (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
...
2024-04-30 12:00:47: Received SUBSCRIBE from hass
...
2024-04-30 12:01:13: Received PUBLISH from z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 12:01:13: Sending PUBLISH to hass (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 12:01:13: Sending PUBLISH to z2m (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
...
2024-04-30 12:01:15: Received PUBLISH from z2m (d0, q0, r1, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 12:01:15: Sending PUBLISH to hass (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))
2024-04-30 12:01:15: Sending PUBLISH to z2m (d0, q0, r0, m0, 'zigbee2mqtt/Room3 Light', ... (638 bytes))

In this case,

  • HA client subscribe to Room3 state topic (zigbee2mqtt/Room3 Light) after Z2M re-published mqtt autoconfig payload at config homeassistant/light/0x0017880103367d01/light/config.
  • HA client issue 2 batches of subscribe requests at 12:00:46 and 12:00:47`.

In both cases, payload of homeassistant/light/0x0017880103367d01/light/config is 985 bytes long.
For some reason, when HA restarts, either it does not parse correctly the payload correctly or it fails to subscribe to the state topic.

@jbouwh
Copy link
Contributor

jbouwh commented Apr 30, 2024

There was a change in 2024.5.b0 causing MQTT not to publish a will. That is offline to topic homeassistant/status. This could cause that Z2M is not triggered to publish configs and states. It was fixed with 2024.5.0b2. May be you can check if that solves the issue?

@seblu
Copy link

seblu commented Apr 30, 2024

There was a change in 2024.5.b0 causing MQTT not to publish a will. That is offline to topic homeassistant/status. This could cause that Z2M is not triggered to publish configs and states. It was fixed with 2024.5.0b2. May be you can check if that solves the issue?

I tested. It's mentioned here
#115958 (comment)

@jbouwh
Copy link
Contributor

jbouwh commented Apr 30, 2024

So what is it that is not working. After MQTT starts, it should replay any retained messages. That should bring up the entities in HA. When the birth message is published, the that should trigger Z2M to start publishing stuff.
If that is not happening, then it could also be an issue with Z2M.

@seblu
Copy link

seblu commented Apr 30, 2024

Well, looks like my previous comments are not clear enough.

So what is it that is not working.

  1. HA starts, Z2M starts : OK. All devices and entities are discovered via MQTT.
  2. Z2M starts, HA starts : KO. Some devices and entities are unavailable.

After MQTT starts, it should replay any retained ages. That should bring up the entities in HA.

MQTT broker logs show that MQTT discovery topics previously published by Z2M are sent to HA MQTT client.
For some reason, HA does not parse them and does not subscribe to state topics.

When the birth message is published, the that should trigger Z2M to start publishing stuff. If that is not happening, then it could also be an issue with Z2M.p

MQTT Discovery topics are retained, so they are published as soon as a new client subscribe. Why is Z2M has to publish stuff after birth messages?

@bdraco
Copy link
Member

bdraco commented Apr 30, 2024

That does sound like the exact problem we just fixed in #116471

@jbouwh
Copy link
Contributor

jbouwh commented May 1, 2024

This is what I think happened. Z2M is triggered by the birth message online to homeassistant/status. This message should be sent after the Home Assistant MQTT integration is connected AND has subscribed, so that it already has processed all retained config messages.

The birth message changes the availability and triggers Z2M to send status updates for all devices. If the birth message is sent to soon, and subscription has not been completed yet, status updated will be missed until subscription has been completed.

So what should happen is:

  1. HA MQTT is connecting to the broker
  2. HA MQTT is processing set up of all YAML config setups
  3. HA MQTT is subscribing for discovery updates, this triggers setup of all items that are set up through discovery.
  4. HA MQTT sends a birth message so availability for all devices toggle to online and Z2M starts sending status updates.

@seblu
Copy link

seblu commented May 1, 2024

I just give a try to 2024.5.0b6 which ship #116471 but some devices are still unavailable.

Regarding point 4., I don't think Z2M wait for HA to be online to send status update. Z2M sends them when they occurs and the last version is delivered by the broker when HA subscribe.

From my previous mosquito logs studies, I know that the broker is sending MQTT discovery topics when a client connects, so I should start looking into the HA mqtt debug logs to see if something may hint you.

@jbouwh
Copy link
Contributor

jbouwh commented May 1, 2024

The configs are fine if they are retained, but devices become unavailable if they miss a status update. The online birth message is to be used for Z2M to no know HA is available. From that point entities should be updated and ZHA should start publishing the actual states.

@umrath
Copy link
Author

umrath commented May 2, 2024

It feels like 2024.5 made it worse.
I'm currently struggling to get it running for half an hour now.

@jbouwh
Copy link
Contributor

jbouwh commented May 2, 2024

You have to wait till 2025.5.1

@bdraco
Copy link
Member

bdraco commented May 2, 2024

cd /config
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d mqtt -p 116550

If you are adventurous and want to test as a custom component

@umrath
Copy link
Author

umrath commented May 2, 2024

After 45 I got it to work, I think.
Next time it fails, I will test it.

@andrejpavlovic
Copy link

andrejpavlovic commented May 3, 2024

Just went from 2024.04.4 to 2024.5.1, but now I can't get button automations to work regardless of how many restarts. Using Zigbee2MQTT 1.37.0-1.

Automations with button action just show "Unknown trigger". Seems to also be reported in #114660

@umrath
Copy link
Author

umrath commented May 3, 2024

That's what I had yesterday.
After 45 minutes of restarting mqtt and z2m randomly, it worked again.
Real pain in the rear.

@umrath
Copy link
Author

umrath commented May 6, 2024

2025.5.1 does not solve the problem - but reverts back to the originally reported state
That means, I can solve the issue by restarting MQTT.
(At least that's the way it was yesterday evening.)

Update: After another restart this morning I could replicate the original behaviour:

  1. After a restart, certain funcationalities were missing.
  2. A restart of MQTT fixes the issue.

Until a solution is available, I am tempted to solve this by triggering an automation about 5 minutes after restarting HA and if certain entities are not available by restarting MQTT.

(update2, translated to english. Sorry!)

@jbouwh
Copy link
Contributor

jbouwh commented May 11, 2024

@umrath did you notice there ware some issues wit z2m as well? What version do you have installled?

@umrath
Copy link
Author

umrath commented May 11, 2024

@umrath did you notice there ware some issues wit z2m as well? What version do you have installled?

I have just upgraded to 1.37.1-1 a day or two ago. I didn't change anything and is unrelated to the evcc problems because the devices are not connected via zigbee.

The issue with evcc is my sungrow system.
It is nicely available in HA itself but making it available in evcc via mqtt does not work anymore since today.

@bdraco
Copy link
Member

bdraco commented May 11, 2024

Another thing we haven't explored is if the event loop is being blocked for an extended period of time? Do you have any custom integrations installed?

https://community.home-assistant.io/t/2024-5-tracking-down-instability-issues-caused-by-integrations/724441

@bdraco
Copy link
Member

bdraco commented May 11, 2024

2024-04-30 11:44:10: Outgoing messages are being dropped for client hass.

If you want to give 117267 a try, it can be installed as a custom component

cd /config
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d mqtt -p 117267

@umrath
Copy link
Author

umrath commented May 12, 2024

Another thing we haven't explored is if the event loop is being blocked for an extended period of time? Do you have any custom integrations installed?

https://community.home-assistant.io/t/2024-5-tracking-down-instability-issues-caused-by-integrations/724441

Of course I do.
But I do not think that this is too relevant, as this issue is a new phenomenon.
Additionally, there are quite a few others with a similar problem.

Regardless, I will look into the tool you described to see, whether this will yield any interesting results.

@jbouwh
Copy link
Contributor

jbouwh commented May 12, 2024

@umrath Based on the test I have done, I think #117267 will finally restore functionallity. In a larger setup it seems the default socket buffers are not big enough. Also we reverted increasing the initial subcribe cooldown, so we should wait less long before actually start subscribing to the broker.

@alexdelprete
Copy link

alexdelprete commented May 17, 2024

I think #117267 will finally restore functionallity.

.4 actually broke my HA->MQTT connection, going back to .3 works again. Here's the error in the log:

2024-05-17 21:19:38.118 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback MQTT._async_on_socket_open(<paho.mqtt.cl...x785546020e00>, None, <paho.mqtt.cl...x785546023dd0>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/components/mqtt/client.py", line 582, in _async_on_socket_open
    self._increase_socket_buffer_size(sock)
  File "/usr/src/homeassistant/homeassistant/components/mqtt/client.py", line 550, in _increase_socket_buffer_size
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, new_buffer_size)
    ^^^^^^^^^^^^^^^
AttributeError: 'WebsocketWrapper' object has no attribute 'setsockopt'

@umrath
Copy link
Author

umrath commented May 17, 2024

.4 did not change anything for me.
Same effects as in .3

@alexdelprete
Copy link

alexdelprete commented May 17, 2024

.4 did not change anything for me. Same effects as in .3

I don't think we all use the same broker connection configuration (I use websockets):

      {
        "entry_id": "056ee5d2891aa662a13407eea72747a0",
        "version": 1,
        "minor_version": 1,
        "domain": "mqtt",
        "title": "mqtt.domain.dom",
        "data": {
          "broker": "mqtt.domain.dom",
          "port": 8083,
          "username": "*******",
          "password": "******",
          "client_id": "hass_homelab",
          "keepalive": 30,
          "tls_insecure": false,
          "protocol": "5",
          "transport": "websockets",
          "ws_path": "/mqtt",
          "ws_headers": {},
          "discovery": true,
          "discovery_prefix": "homeassistant",
          "birth_message": {
            "topic": "homeassistant/status",
            "payload": "online",
            "qos": 0,
            "retain": false
          },
          "will_message": {
            "topic": "homeassistant/status",
            "payload": "offline",
            "qos": 0,
            "retain": false
          }
        },
        "options": {},
        "pref_disable_new_entities": false,
        "pref_disable_polling": false,
        "source": "user",
        "unique_id": null,
        "disabled_by": null
      },

@seblu
Copy link

seblu commented May 17, 2024

Based on the test I have done, I think #117267 will finally restore functionallity. In a larger setup it seems the default socket buffers are not big enough. Also we reverted increasing the initial subcribe cooldown, so we should wait less long before actually start subscribing to the broker.

2024.5.4 solved the issue described in #115958 (comment). I can now restart HA without have to restart Z2M.

Thanks @jbouwh and @bdraco !

@bdraco
Copy link
Member

bdraco commented May 18, 2024

I think #117267 will finally restore functionallity.

.4 actually broke my HA->MQTT connection, going back to .3 works again. Here's the error in the log:

2024-05-17 21:19:38.118 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback MQTT._async_on_socket_open(<paho.mqtt.cl...x785546020e00>, None, <paho.mqtt.cl...x785546023dd0>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/components/mqtt/client.py", line 582, in _async_on_socket_open
    self._increase_socket_buffer_size(sock)
  File "/usr/src/homeassistant/homeassistant/components/mqtt/client.py", line 550, in _increase_socket_buffer_size
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, new_buffer_size)
    ^^^^^^^^^^^^^^^
AttributeError: 'WebsocketWrapper' object has no attribute 'setsockopt'

It looks like the paho passes the wrapper when using websockets and setsockopt method isn't exposed on the WebsocketWrapper. That's unfortunate. It looks like their example also has the same issue https://github.com/eclipse/paho.mqtt.python/blob/d45de3737879cfe7a6acc361631fa5cb1ef584bb/examples/loop_asyncio.py#L92

Let me see if I can come up with a workaround

@bdraco
Copy link
Member

bdraco commented May 18, 2024

@umrath

.4 did not change anything for me. Same effects as in .3

I'd focus on solving your other issue as they may be related, and there is a good chance whatever is causing the crash is also blocking your event loop for long enough that even increasing the buffer is not enough (unless the fault log there is fresh and its not a thread safety crash).

@bdraco
Copy link
Member

bdraco commented May 18, 2024

@alexdelprete

I opened #117672 which should workaround the issue with websockets. Unfortunately I cannot test it myself at the moment because I am on limited emergency power due to the storms in Houston.

If you would like to try it as a custom component you can run the following:

cd /config
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d mqtt -p 117672

If that doesn't solve it delete /config/custom_components/mqtt to get back to the core version

@alexdelprete
Copy link

If you would like to try it as a custom component you can run the following:

It worked, thanks.

@bdraco
Copy link
Member

bdraco commented May 19, 2024

If you would like to try it as a custom component you can run the following:

It worked, thanks.

Thanks for testing. I opened a PR to the upstream library as well so we won't have to use the workaround long term.
eclipse-paho/paho.mqtt.python#843

@alexdelprete
Copy link

alexdelprete commented May 19, 2024

Thanks. If this will be in .5 can I simply delete the custom_components/mqtt folder and restart HA to go back to core MQTT client?

@bdraco
Copy link
Member

bdraco commented May 19, 2024

Thanks. If this will be in .5 can I simply delete the custom_components/mqtt folder and restart HA to go back to core MQTT client?

Yes exactly

@alexdelprete
Copy link

Yes exactly

Thanks Nick. I didn't know you could override native integrations through custom_components. :)

@umrath
Copy link
Author

umrath commented May 23, 2024

@alexdelprete

I opened #117672 which should workaround the issue with websockets. Unfortunately I cannot test it myself at the moment because I am on limited emergency power due to the storms in Houston.

If you would like to try it as a custom component you can run the following:

cd /config
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d mqtt -p 117672

If that doesn't solve it delete /config/custom_components/mqtt to get back to the core version

This doesn't seem to do change anything here.
Actually, after moving to a dedicated mqtt broker, things got even worse for me.
I'm currently not able to get all devices to work.

What's most frustrating: mqtt does obviously get a message as the "last seen" flag is updated whenever I press a button. But the event itself is not registered.

@umrath
Copy link
Author

umrath commented May 23, 2024

Does the configuration of mqtt itself matter, e.g. the protocol version?
For some reason it defaults to 3.1.1, which I haven't changed so far.
Zigbee2mqtt defaults to v4. Also unchanged.

I didn't have any issues with those setting prior to mqtt@ha ceasing to do its job a while ago.

Also interesting:
Out of desperation I have tried to enable and disable certain options.
When I enable "legacy triggers" in z2m, I do in fact see the events as "action" in the device page.
But I cannot use the events in Homeassistant. They simply do not show as available option.

The sad part: If I look into the mqtt info for the respective device, I even see the event being received:

Received 12:34:02

    QoS: 0
    Payload:

    action: off_press_release
    action_duration: null
    battery: 100
    device:
      applicationVersion: 2
      dateCode: '20190410'
      friendlyName: Hue Fernbedienung Tresen
      hardwareVersion: 1
      ieeeAddr: '0x00178801106f1f07'
      manufacturerID: 4107
      manufacturerName: Philips
      model: '324131092621'
      networkAddress: 23073
      powerSource: Battery
      softwareBuildID: 6.1.1.28573
      stackVersion: 1
      type: EndDevice
      zclVersion: 1
    last_seen: '2024-05-23T10:34:02.902Z'
    linkquality: 83
    update:
      installed_version: 1107324829
      latest_version: 1107324829
      state: idle
    update_available: null

But HA does not "get it" as it seems.

@jbouwh
Copy link
Contributor

jbouwh commented May 23, 2024

Does the configuration of mqtt itself matter, e.g. the protocol version?
For some reason it defaults to 3.1.1, which I haven't changed so far.

That should be fine

@umrath
Copy link
Author

umrath commented May 28, 2024

After moving from Homeassistant blue to a Proxmox vm, the problem seems to have vanished (for me).
Completely.
Restarting the instance is no problem anymore and all entities are back in the blink of an eye.

@jbouwh
Copy link
Contributor

jbouwh commented May 29, 2024

I believe we can close this issue at short term.

@bdraco bdraco closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests