-
-
Notifications
You must be signed in to change notification settings - Fork 32k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQTT does not start up properly (sometimes) #115958
Comments
Hey there @emontnemery, @jbouwh, @bdraco, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) mqtt documentation |
There were some improvements to MQTT startup in 2024.5.0b0 Is there a chance you can try beta? |
This would jeopardize the wife acceptance factor. |
On my prod setup I have a similar issue since several releases. I tried with So, restart HA will result in some entities unavailable (e.g lights, z2m bridge state) and others not (e.g z2m bridge version).
Restart Z2M after HA starts restore missing entities. |
The sending of will messages was fixed in #116319, a few hours after the previous report.
Original issue is not fixed. Here is an example with default entities exported by Z2M. |
I don't know if this is helpful or not, but a way to "fix" this is to rename the button(s) to something else in Zigbee2MQTT (be sure to select "Update Home Assistant entity ID") and then back again. The action subtype isn't populated in the automation editor before doing this. |
Please note that when all entities are successfully started, restarting the broker does not cause the issue. I enabled more logging on the moquitto broker and here is outputs focused on a light device soberly named MQTT broker logs when HA restart
Note:
MQTT auto-discovery payload of the light requested from external client
MQTT broker logs when Z2M restart
In this case,
In both cases, payload of |
There was a change in 2024.5.b0 causing MQTT not to publish a |
I tested. It's mentioned here |
So what is it that is not working. After MQTT starts, it should replay any retained messages. That should bring up the entities in HA. When the birth message is published, the that should trigger Z2M to start publishing stuff. |
Well, looks like my previous comments are not clear enough.
MQTT broker logs show that MQTT discovery topics previously published by Z2M are sent to HA MQTT client.
MQTT Discovery topics are retained, so they are published as soon as a new client subscribe. Why is Z2M has to publish stuff after birth messages? |
That does sound like the exact problem we just fixed in #116471 |
This is what I think happened. Z2M is triggered by the The birth message changes the availability and triggers Z2M to send status updates for all devices. If the birth message is sent to soon, and subscription has not been completed yet, status updated will be missed until subscription has been completed. So what should happen is:
|
I just give a try to Regarding point 4., I don't think Z2M wait for HA to be online to send status update. Z2M sends them when they occurs and the last version is delivered by the broker when HA subscribe. From my previous mosquito logs studies, I know that the broker is sending MQTT discovery topics when a client connects, so I should start looking into the HA mqtt debug logs to see if something may hint you. |
The configs are fine if they are retained, but devices become unavailable if they miss a status update. The |
It feels like 2024.5 made it worse. |
You have to wait till 2025.5.1 |
If you are adventurous and want to test as a custom component |
After 45 I got it to work, I think. |
Just went from Automations with button action just show "Unknown trigger". Seems to also be reported in #114660 |
That's what I had yesterday. |
2025.5.1 does not solve the problem - but reverts back to the originally reported state Update: After another restart this morning I could replicate the original behaviour:
Until a solution is available, I am tempted to solve this by triggering an automation about 5 minutes after restarting HA and if certain entities are not available by restarting MQTT. (update2, translated to english. Sorry!) |
@umrath did you notice there ware some issues wit z2m as well? What version do you have installled? |
I have just upgraded to 1.37.1-1 a day or two ago. I didn't change anything and is unrelated to the evcc problems because the devices are not connected via zigbee. The issue with evcc is my sungrow system. |
Another thing we haven't explored is if the event loop is being blocked for an extended period of time? Do you have any custom integrations installed? |
If you want to give 117267 a try, it can be installed as a custom component
|
Of course I do. Regardless, I will look into the tool you described to see, whether this will yield any interesting results. |
@umrath Based on the test I have done, I think #117267 will finally restore functionallity. In a larger setup it seems the default socket buffers are not big enough. Also we reverted increasing the initial subcribe cooldown, so we should wait less long before actually start subscribing to the broker. |
.4 actually broke my HA->MQTT connection, going back to .3 works again. Here's the error in the log: 2024-05-17 21:19:38.118 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback MQTT._async_on_socket_open(<paho.mqtt.cl...x785546020e00>, None, <paho.mqtt.cl...x785546023dd0>)
Traceback (most recent call last):
File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/usr/src/homeassistant/homeassistant/components/mqtt/client.py", line 582, in _async_on_socket_open
self._increase_socket_buffer_size(sock)
File "/usr/src/homeassistant/homeassistant/components/mqtt/client.py", line 550, in _increase_socket_buffer_size
sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, new_buffer_size)
^^^^^^^^^^^^^^^
AttributeError: 'WebsocketWrapper' object has no attribute 'setsockopt'
|
.4 did not change anything for me. |
I don't think we all use the same broker connection configuration (I use websockets): {
"entry_id": "056ee5d2891aa662a13407eea72747a0",
"version": 1,
"minor_version": 1,
"domain": "mqtt",
"title": "mqtt.domain.dom",
"data": {
"broker": "mqtt.domain.dom",
"port": 8083,
"username": "*******",
"password": "******",
"client_id": "hass_homelab",
"keepalive": 30,
"tls_insecure": false,
"protocol": "5",
"transport": "websockets",
"ws_path": "/mqtt",
"ws_headers": {},
"discovery": true,
"discovery_prefix": "homeassistant",
"birth_message": {
"topic": "homeassistant/status",
"payload": "online",
"qos": 0,
"retain": false
},
"will_message": {
"topic": "homeassistant/status",
"payload": "offline",
"qos": 0,
"retain": false
}
},
"options": {},
"pref_disable_new_entities": false,
"pref_disable_polling": false,
"source": "user",
"unique_id": null,
"disabled_by": null
},
|
|
It looks like the paho passes the wrapper when using websockets and Let me see if I can come up with a workaround |
I'd focus on solving your other issue as they may be related, and there is a good chance whatever is causing the crash is also blocking your event loop for long enough that even increasing the buffer is not enough (unless the fault log there is fresh and its not a thread safety crash). |
I opened #117672 which should workaround the issue with websockets. Unfortunately I cannot test it myself at the moment because I am on limited emergency power due to the storms in Houston. If you would like to try it as a custom component you can run the following:
If that doesn't solve it delete |
It worked, thanks. |
Thanks for testing. I opened a PR to the upstream library as well so we won't have to use the workaround long term. |
Thanks. If this will be in .5 can I simply delete the custom_components/mqtt folder and restart HA to go back to core MQTT client? |
Yes exactly |
Thanks Nick. I didn't know you could override native integrations through custom_components. :) |
This doesn't seem to do change anything here. What's most frustrating: mqtt does obviously get a message as the "last seen" flag is updated whenever I press a button. But the event itself is not registered. |
Does the configuration of mqtt itself matter, e.g. the protocol version? I didn't have any issues with those setting prior to mqtt@ha ceasing to do its job a while ago. Also interesting: The sad part: If I look into the mqtt info for the respective device, I even see the event being received:
But HA does not "get it" as it seems. |
That should be fine |
After moving from Homeassistant blue to a Proxmox vm, the problem seems to have vanished (for me). |
I believe we can close this issue at short term. |
The problem
When restarting HA (to apply new settings, updates, etc.) I have a fair chance that MQTT is seemingly starting up properly - but missing certain functionalities.
Most prominently, button events are not registered, even though they are clearly visible in Zigbee2MQTT.
The usual workaround: Restarting MQTT until it works again.
Sometimes it works after restarting once, sometimes it need 2 or more restarts until all events are registered again.
In the logfile I cannot find anything remotely related. The events seem to just "vanish" silently.
What version of Home Assistant Core has the issue?
core-2024.4.3
What was the last working version of Home Assistant Core?
core-2024.1 (maybe)
What type of installation are you running?
Home Assistant OS
Integration causing the issue
MQTT
Link to integration documentation on our website
No response
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
This issue is fairly new. It probably started in Release 2024.3 or so.
The text was updated successfully, but these errors were encountered: