Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deCONZ 2.05.85 crashes with Segmentation fault #3427

Closed
oywino opened this issue Oct 16, 2020 · 36 comments
Closed

deCONZ 2.05.85 crashes with Segmentation fault #3427

oywino opened this issue Oct 16, 2020 · 36 comments

Comments

@oywino
Copy link

oywino commented Oct 16, 2020

Describe the bug

Steps to reproduce the behavior

Expected behavior

Screenshots

Environment

  • Host system: QNAP 453Be NAS
  • Running method: Marthoc Docker container
  • Firmware version: 26580700
  • deCONZ version: 2.05.85
  • Device: ConBee II
  • Do you use an USB extension cable: yes

deCONZ Logs

Additional context

@arnerek
Copy link

arnerek commented Oct 16, 2020

I have the same behaviour. Do you have an Elko Super TR as well? If you don't I will create my own issue.

Describe the bug

I just upgraded to 2.05.85. deCONZ crashes after few minutes. I suspect the bug is related to the following PR: #3329

Steps to reproduce the behavior

The bug is reproduced at each launch. If the Elko Super TR is removed from the network, deCONZ is running fine.

Environment

  • Host system: Raspberry Pi 3B+
  • Running method: Raspbian
  • Firmware version: 26390500
  • deCONZ version: 2.05.85
  • Device: Raspbee I

deCONZ Logs

20:48:53:923 ZCL configure reporting rsp seq: 51 0x000B57FFFE8CC7F2 for ep: 0x01 cluster: 0x0008 attr: 0x0000 status: 0x00
20:48:54:078 Bind response success for 0xd0cf5efffe11cfcd ep: 0x01 cluster: 0x0006
20:48:54:079 configure reporting rq seq 52 for 0xD0CF5EFFFE11CFCD, attribute 0x0006/0x0000
20:48:54:126 Incr. ZDP retry count 2 on item 7
20:48:54:209 ZCL configure reporting rsp seq: 52 0xD0CF5EFFFE11CFCD for ep: 0x01 cluster: 0x0006 attr: 0x0000 status: 0x00
20:48:54:379 rule event /config/localtime: 20:48:53.377 -> 20:48:54.377 (1000ms)
20:48:54:807 poll node 00:0d:6f:00:15:61:31:2d-01-0201
20:48:54:807 Poll ZHAThermostat sensor node Super TR
Segmentation fault

Full log:
deconz_log.txt

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 16, 2020

@arnerek You saved this one, otherwise I would have closed it. So far, I've spotted two errors (usage of wrong function), but that shouldn't lead to a segfault. For another candidate related to polling, I'm not too sure. Could you make a core dump as described in the wiki? That would presumably help speeding up the search.

@arnerek
Copy link

arnerek commented Oct 17, 2020

Here is a core dump:
core-deCONZ-sig11-user1000-group1000-pid2192-time1602915330.gz

@oywino
Copy link
Author

oywino commented Oct 17, 2020

I discovered that by completely deleting all content in the config folder (/root/.local/share/dresden-elektronik/deCONZ) instead of carrying it over from my previous version, and then reinstalled deCONZ, I no longer got the segfault.
But of course, instead - I lost all my devices and the prospect of re-paring isn't fun because it breaks all my automations and scripts in Home Assistant as well since all entity names are renewed (changed).

@arnerek
Copy link

arnerek commented Oct 17, 2020

@oywino Does the Segmentation Fault reappear when adding the Super TR?

@oywino
Copy link
Author

oywino commented Oct 17, 2020

The funny thing is that the Super TR appeared in deConz without me doing anything. But with a new (0xYYYY) name-code. I haven't tried to add it to Phoscon - is that what you mean?

@arnerek
Copy link

arnerek commented Oct 17, 2020

Is the Super TR available in the Rest API?

After deleting my Super TR nodes, deCONZ .85 is running fine. Immediately after adding this through Phoscon I get Segfault and also after new startup attempts.

@oywino
Copy link
Author

oywino commented Oct 17, 2020

Yes, you are right. At first it was not available in the API. After adding it through Phoscon, I too immediately got Segfault.

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 17, 2020

@arnerek I asked @manup to have a look, but apparently he also couldn't spot anything causing the segfault. You may want to give compiling my latest PR a try, but I have little hope that it solves it.

@arnerek
Copy link

arnerek commented Oct 17, 2020

@SwoopX Thanks! I compiled your code. I had to add VENDOR_EMBER in two of the function calls. Unfortunately, the segmentation fault is still there.

By mistake I copied the .84 libde_rest_plugin.so. This is running flawlessly with deconz .85 (this maybe obviously for you but I thought I could mention it)

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 17, 2020

I had to add VENDOR_EMBER in two of the function calls. Unfortunately, the segmentation fault is still there.

Crap. Where did you need to add that and why?

@arnerek
Copy link

arnerek commented Oct 17, 2020

In the two calls: addTaskThermostatReadWriteAttribute

The function call was not in agreement with definition, so compilation failed. I thought there was a missing manufacturer code. Do you agree?

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 17, 2020

Ah, in my memory, the mfc was optional and defaulting to 0...

It should have been 0 or 0x0000 btw since you mentioned that the attributes are not manufacturer specific. Anyway, that shouldn't have made any difference...

@Smanar
Copy link
Collaborator

Smanar commented Oct 18, 2020

I have never used that, but there is some tool to read the core dump to have more informations ? Or it s something platform dependant too ?
Because ATM I realy don't see problem, and I don't see how to use the core dump.

@arnerek
Copy link

arnerek commented Oct 18, 2020

Since the rest_plugin from .84 works, maybe I could compile different commits from .84 to .85 and see for which version the seg fault is appearing? Can this contribute to debugging? I could also provide new core dump to see if it has changed (not sure if previous dump gave indication on which call that caused the seg fault?)

I got this from gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x72ce2438 in ResourceItem::toString() const () from /usr/share/deCONZ/plugins/libde_rest_plugin.so

@Smanar
Copy link
Collaborator

Smanar commented Oct 18, 2020

Ha yep, this error message can be usefull yes. We know the command now.

@SwoopX


                    item = sensor->item(RConfigTemperatureMeasurement);
                    if (item && item->toString() != mode_set)

There is a defaut value ?

@arnerek
Copy link

arnerek commented Oct 18, 2020

Sorry, I am new with gdb but this backtrack should be consistent with @SwoopX fixes branch.

0 0x72ce2af8 in Resource::item(char const*) const () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#1 0x72cd354c in LightNode::manufacturer() const () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#2 0x72cd8434 in PollManager::pollTimerFired() () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#3 0x760cabfc in QMetaObject::activate(QObject*, int, int, void**) ()
from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
#4 0x760d7d04 in QTimer::timerEvent(QTimerEvent*) () from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
#5 0x760cb9c4 in QObject::event(QEvent*) () from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
#6 0x76930408 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
from /usr/lib/arm-linux-gnueabihf/libQt5Widgets.so.5
#7 0x7693860c in QApplication::notify(QObject*, QEvent*) () from /usr/lib/arm-linux-gnueabihf/libQt5Widgets.so.5
#8 0x02047c80 in ?? ()

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 18, 2020

Cool, that should help. I'll check it out.

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 18, 2020

@arnerek Could you please try replacing the current code in thermostat.cpp with this one here?

case 0x0403: // Temperature measurement
            {
                if (zclFrame.manufacturerCode() == VENDOR_EMBER && sensor->modelId().startsWith(QLatin1String("Super TR"))) // ELKO
                {
                    quint8 mode = attr.numericValue().u8;
                    QString mode_set;

                    if ( mode == 0x00 ) { mode_set = QString("air sensor"); }
                    else if ( mode == 0x01 ) { mode_set = QString("floor sensor"); }
                    else if ( mode == 0x03 ) { mode_set = QString("floor protection"); }
                    else { mode_set = QString("unknown"); }
                    
                    item = sensor->item(RConfigTemperatureMeasurement);

                    if (item && item->toString() != mode_set)
                    {
                        item->setValue(mode_set);
                        enqueueEvent(Event(RSensors, RConfigTemperatureMeasurement, sensor->id(), item));
                        configUpdated = true;
                    }
                }
                sensor->setZclValue(updateType, ind.srcEndpoint(), THERMOSTAT_CLUSTER_ID, attrId, attr.numericValue());
            }
                break;

@arnerek
Copy link

arnerek commented Oct 18, 2020

Sorry! Still same Segmentation fault!

I made a dirty trick:

  • in void PollManager::poll(RestNodeBase restNode, const QDateTime &tStart) I disabled the ZHAThermostat sensors by
    else if (r->prefix() == RSensors)
    {
    sensor = dynamic_cast<Sensor
    >(restNode);
    DBG_Assert(sensor);
    if (!sensor || sensor->deletedState() != Sensor::StateNormal)
    {
    return;
    }
    if (qPrintable(sensor->type()) == QLatin1String("ZHAThermostat") )
    {
    return;
    }

I no longer have Segmentation Fault so at least some progress.

I expect this has something to do with LightNode manufacture name. I notice that the Manufacturer name from the Rest API is Heiman.

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 18, 2020

Thanks. The Heiman thing is interesting, as it shouldn't be. Is a light resource created for it? That should also not happen...

@arnerek
Copy link

arnerek commented Oct 18, 2020

No light resource created. Maybe I should remove and add again in the new version?

{
"config": {
"heatsetpoint": 2500,
"locked": null,
"offset": 0,
"on": true,
"reachable": true,
"schedule": {},
"schedule_on": null,
"temperaturemeasurement": null
},
"ep": 1,
"etag": "918b64f252e1a2de0bb4d4bdba14cc9c",
"lastseen": "2020-10-18T16:49Z",
"manufacturername": "Heiman",
"modelid": "Super TR",
"name": "Super TR (2)",
"state": {
"floortemperature": 0,
"heating": false,
"lastupdated": "2020-10-18T16:49:47.500",
"on": false,
"temperature": 2000
},
"type": "ZHAThermostat",
"uniqueid": "00:0d:6f:00:15:61:31:2d-01-0201"
}

@arnerek
Copy link

arnerek commented Oct 18, 2020

Let me double check that Heiman thing. I might have changed some Vendor ID in the debugging

@arnerek
Copy link

arnerek commented Oct 18, 2020

Just checked some more. The "Heiman" manufacturer name is still present with the SwoopX:fixes branch + #3427 (comment) + disabling polling for ZHAThermostats.

Should I try to remove node and add again in 2.05.85?

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 18, 2020

I just added a protection to change the manufacturer to Heiman in the PR so that shouldn't happen again. I still don't see the root cause for the segfault to occur.

@oywino
Copy link
Author

oywino commented Oct 18, 2020

I follow this thread with great interest. Regretfully I'm not able to contribute much, but I depend on you guys to solve this.
For what it's worth, "Heiman" is the name of the German manufacturer of the ELKO Thermostat.
ELKO doesn't make any of this themselves. They just buy OEM from Heiman in Germany. But ELKO have defined part of the feature spec and they contribute somewhat to the firmware in the device. So the name Heiman should be there, and has always been there.

@arnerek
Copy link

arnerek commented Oct 18, 2020

Ok, but in Deconz when reading the Manufacturer name attribute is ELKO. I was just asking if this was an indication of a logical error in the API.

I will continue to add more debug statements in order to pin-point where the segmentation fault is occurring.

manup added a commit to manup/deconz-rest-plugin that referenced this issue Oct 19, 2020
lightNode pointer wasn't guarded to check for nullptr in RStateOn handler.
Issue: dresden-elektronik#3427
@SwoopX
Copy link
Collaborator

SwoopX commented Oct 19, 2020

Hi. I've done some debugging with @manup and looks like we've found a possible candidate for this one. Seems that you have a certain combination of values, not too common, that's causing it.

@arnerek Would be cool if you could include the upcoming changes by manup locally and test it. All changes/fixes are now included in the official master branch.

@arnerek
Copy link

arnerek commented Oct 19, 2020

Thanks for the update. I can now confirm with the .86 release that this solved the segmentation fault!

Great work :)

@oywino
Copy link
Author

oywino commented Oct 19, 2020

Congratulation!! Any chance for us to see .86 on Docker HUB ?

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 19, 2020

Great!

That's up to marthoc, but usually doesn't take long

@Arquiteto
Copy link

Thanks. The Heiman thing is interesting, as it shouldn't be. Is a light resource created for it? That should also not happen...

That Heiman as manufacturer is a thing, as it also happened to me with Tuya/Moes TRV. It was correct in deCONZ, but REST API calls said Heiman. After @Smanar made changes to his Tuya branch it's correct in API as well. It's issue #3440
No light resource created for me on 2.05.85, but on earlier versions that didn't support that valve it was.

@oywino
Copy link
Author

oywino commented Oct 20, 2020

I can confirm now that .86 runs in Docker without any segmentation fault.
I'm not sure what exactly to expect from .86 but even if the Thermostat is "successfully" paired with Phoscon, it still doesn't show up in Phoscon. The REST-API shows the ELKO (and the name "Heiman" is gone).
There is no visible to differences in the deCONZ attribute-list (as far as I can see).
So, what now?

@arnerek
Copy link

arnerek commented Oct 20, 2020

I have the new attributes in the list but the new attributes are not updating. It is also interesting that one Super TR has manufacturer Elko while the other Super TR has Heiman:

"config": {
"heatsetpoint": 500,
"locked": true,
"offset": 0,
"on": true,
"reachable": true,
"schedule": {},
"schedule_on": null,
"temperaturemeasurement": null
},
"ep": 1,
"etag": "356ccc27c349d7ed14547df0097cbac4",
"lastseen": "2020-10-20T18:39Z",
"manufacturername": "Heiman",
"modelid": "Super TR",
"name": "Super TR (2)",
"state": {
"floortemperature": 0,
"heating": false,
"lastupdated": "2020-10-20T18:39:19.666",
"on": false,
"temperature": 1900
},
"type": "ZHAThermostat",
"uniqueid": "00:0d:6f:00:15:61:31:2d-01-0201"
}

{
"config": {
"heatsetpoint": 500,
"locked": null,
"offset": 0,
"on": true,
"reachable": true,
"schedule": {},
"schedule_on": null,
"temperaturemeasurement": null
},
"ep": 1,
"etag": "36b219c30bae5783bd90dbdac72690e2",
"lastseen": "2020-10-20T20:36Z",
"manufacturername": "ELKO",
"modelid": "Super TR",
"name": "Super TR",
"state": {
"floortemperature": 0,
"heating": false,
"lastupdated": "2020-10-20T17:01:01.534",
"on": false,
"temperature": 2020
},
"type": "ZHAThermostat",
"uniqueid": "00:0d:6f:00:15:55:29:4f-01-0201"
}

@Smanar
Copy link
Collaborator

Smanar commented Oct 20, 2020

Lol, I m realy bored by this heiman ^^.

If you want I can remove the line in the fonction, SwoopX (that we have talked the last time) ?

@SwoopX
Copy link
Collaborator

SwoopX commented Oct 20, 2020

First of all, it is no surprise that the device is not visible in Phoscon, as it is just another REST API client. Also take note, that we're here in the REST API respository. So in that matter, please raise that in the Phoscon beta repository.

Secondly, you should reset and re-pair the devices. This deletes the current sensors (or marks them as such) and creates a new one with all current capabilities. Having no updates in terms of value updates could have 2 reasons: 1. bindings have been missed. Should also be resolved by reset/re-pair. 2. The device does not support attribte reporting.

Lastly, please note that we're driving here on the highway in the night without any lights. No dev has the device at hand to sniff and see what's over the air, so all is based on best effort.

Anyway, please raise a new issue if anything's not working and we can see what to do. As is is not topic related anymore, I'm closing this.

@Smanar, Heiman is no topic for this device anymore as I prevented it to happen. No issue from my end.

@SwoopX SwoopX closed this as completed Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants