Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node descriptor reported as being null, macCapabilities and receiverOnWhenIdle() #2154

Closed
SwoopX opened this issue Nov 30, 2019 · 6 comments
Closed
Labels

Comments

@SwoopX
Copy link
Collaborator

SwoopX commented Nov 30, 2019

Gents,

I'd like to discuss a couple of technical points I stumbled upon during my journey to properly integrate Develco end devices (especially the smoke sensor). It is in particular regarding:

  • usage of the device's mac capabilities (variable macCapabilities in bindings.cpp)
  • in conjunction with the aforementioned point the impact of using receiverOnWhenIdle()
  • node descriptor being reported as null in loops

macCapabilities

Let me start with this point as it may aid in further device integration but is also part of the overall story. To keep that rather short, I managed to identify the root cause of the Develco smoke sensor pairing issues (sensors were not created and the simple descriptors not read) in here:

// filter supported devices
// Busch-Jaeger
if (checkMacVendor(ext, VENDOR_BUSCH_JAEGER))
{
}
else if (checkMacVendor(ext, VENDOR_UBISYS))
{
}
else if (checkMacVendor(ext, VENDOR_BOSCH))
{ // macCapabilities == 0
}
else if (macCapabilities & deCONZ::MacDeviceIsFFD)
{
return;
}
else if (macCapabilities == 0)
{
return;
}

Based on the analysis of the sniffed node descriptor response, the mac capabilities of the device are reported as 0x00, meaning the following code got never processed. By whitelisting, delayedFastEnddeviceProbe is now entered and follows the intended execution flow, sensors are created and simple descriptors are fetched automatically and show up in deconz GUI.
get node descriptors -> get active endpoints -> get simple descriptors -> read basic cluster attributes...

This is related to #669 and probably also to #1873 .

However, as to my understanding, mac capabilities of 0x00 should be technically correct for battery powered sensor end devices nowadays. Some sensors I checked report 0x80, which is also ok, but the point I want to make here is that the bit for AC power is 0 as well as for Rx on when idle. Especially the second value is in line with official zigbee documentation if I don't understand it wrongly.
Now, this brings me to the 2nd point...

receiverOnWhenIdle()

While the smoke sensors were finally able to pair correctly, binding and attribute reporting was not working. Based on the technical documentation, that is supported (and works with some hacks). After another whitelist to read the binding table, sending a bind request always failed due to the following code:

if (!s.node() || s.node()->nodeDescriptor().isNull())
{
return false; // needs to be known
}
if (s.node()->nodeDescriptor().receiverOnWhenIdle())
{
break; // ok
}

With the given mac capabilities of 0x00, rx on when idle is 0 and therefore the check fails. Again, based on my understanding, rx on when idle being 0 would be correct so is that check reasonable? This is certainly not the only place where this check is made and might have some further impact if generally changed.

However, the check above is the one really giving me a headache and the main reason binding and reporting does not work out, leading me to the 3rd point...

Node descriptor is null

For binding and reporting, those 2 checks must be passed:

if (!s.node() || s.node()->nodeDescriptor().isNull())
{
return false; // needs to be known
}

if (sensor->node()->nodeDescriptor().isNull())
{
return false;
}

I really do not understand how and why this can be. While sniffing the traffic, I do see a valid node descriptor request and response, so all information should be there. Based on that I could imagine that other parts of code also do not give expected results. @manup @ebaauw any thoughts on this? Could it be an issue in deconz core? Happy to send one device around for further analysis if that would somehow help in investigating.

The only way I currently see to dance around this is to implement a check for Develco mac address. Interestingly, only the smoke sensor seems to have that issue. Other Develco devices seem to be ok, based on provided screenshots of the node info pane from deconz GUI. Btw, it might also be the reason why the Bosch sensor from #669 doesn't send temperature values...

Highly appreciate all input on this!
Thanks!

@ebaauw
Copy link
Collaborator

ebaauw commented Dec 1, 2019

macCapabilities

I'm afraid these are new to me; I haven't come across these before.

They are reported in the Device Announcement message. They're defined in zdp_descriptors.h (from the deconz-dev package):

/*! IEEE 802.15.4 MAC capabilities. */
enum MacCapability
{
    MacAlternatePanCoordinator = (1 << 0), //!< Can be a alternate pan coordinator
    MacDeviceIsFFD             = (1 << 1), //!< Is a full function device
    MacIsMainsPowered          = (1 << 2), //!< Is mains powered
    MacReceiverOnWhenIdle      = (1 << 3), //!< Has receiver on when idle
    MacSecuritySupport         = (1 << 6), //!< Has (high) security support
    MacAllocateAddress         = (1 << 7)  //!< Allocates address
};

Did a quick check: all my routers (even the XBees) report 0x8e, all my end devices report 0x80, except for the (mains powered) lumi.ctrl_neutral, which reports 0x8c. Oddly the ConBee II reports 0x0f, whereas the RaspBee reports 0x8e.

EDIT The mac capabilities are also reported in the Node Descriptor. The lumi.ctrl_neutral reports 0x84 there, instead of 0x8c. This confirms my suspicion that it incorrectly reports Receiver On When Idle as false. /EDIT

However, as to my understanding, mac capabilities of 0x00 should be technically correct for battery powered sensor end devices nowadays.

I've no clue what "allocate address" means, but as the ConBee II doesn't report this bit, it seems reasonable to expect 0x00 is a valid value for end devices. I suspect the check is for (legacy?) devices that don't report mac capabilities correctly. As far as I can tell, the check applies only to the creation of /sensors resources.

receiverOnWhenIdle

I have come across this one several times. We created a whitelist to create /lights resources for "light sleepers", battery powered end devices where Receiver On When Idle is not set. Typically, these end devices poll their parent every 5 seconds or so, so effectively they can always be reached by the gateway. As parent routers only keep the message for "their" end devices for like 7 seconds, "deep sleepers" can only be reached when they poll their parent.

However, the check above is the one really giving me a headache and the main reason binding and reporting does not work out

There's logic all over the REST API plugin to check when to send a request to these deep sleepers: for bindings, attribute reporting, and polling the device state. Typically, this code assumes the end device will be polling its parent, after it sent an attribute report or other command, and during the extended polling period while pairing/after the device announcement.

Indeed, headaches. As I've remarked earlier, "there's always one more place to whitelist a device". I think the only structural solution is a refactor of the REST API plugin, for API v2.

Node descriptor is null

I don't think it's valid for a ZigBee device not to report a Node Descriptor. The check is not so much whether the device has a node descriptor, as whether the descriptor has already been read (I think, indeed, by the deCONZ core). Especially for deep sleepers with no or a very short extended poll period on boot/pairing, it can be challenging to read these in time.

Usually, forcing the device to remain awake during pairing does the trick, but unfortunately not always. The IKEA Symfonisk controller is extremely challenging, since it wakes only user actions after the bindings have been created, see #1898 (comment).

@SwoopX
Copy link
Collaborator Author

SwoopX commented Dec 1, 2019

@ebaauw thanks for following up. Appreciate your detailed input here, as always.

They are reported in the Device Announcement message. They're defined in zdp_descriptors.h (from the deconz-dev package)

Nice catch. Actually, it seems that there are 3 different usable sources of the capabilities:

  1. Node descriptor response
  2. Device announcement
  3. IEEE 802.15.4 (Association request, RFD)

The file you mentioned suggests mac capabilities are picked up from option 3. Interestingly, the traffic shows me that for the Develco smoke sensor, mac is 0x80 for option 3 and for the others 0x00. Pretty weired. Double checked with an older Heiman smoke sensor and there, it's consistent.

I've no clue what "allocate address" means, but as the ConBee II doesn't report this bit, it seems reasonable to expect 0x00 is a valid value for end devices.

Got also no real clue, what that means...

I suspect the check is for (legacy?) devices that don't report mac capabilities correctly. As far as I can tell, the check applies only to the creation of /sensors resources.

Agreed.

receiverOnWhenIdle
I have come across this one several times. We created a whitelist to create /lights resources for "light sleepers", battery powered end devices where Receiver On When Idle is not set. Typically, these end devices poll their parent every 5 seconds or so, so effectively they can always be reached by the gateway. As parent routers only keep the message for "their" end devices for like 7 seconds, "deep sleepers" can only be reached when they poll their parent.

Yeah. I know, based on documentation, that the device is a light sleeper. Basically, no big deal here, especially, since on the second look, the check doesn't do anything (except for exiting the loop). So all good here.

Indeed, headaches. As I've remarked earlier, "there's always one more place to whitelist a device". I think the only structural solution is a refactor of the REST API plugin, for API v2.

Would be a bless, indeed. But lots of work to do...

I don't think it's valid for a ZigBee device not to report a Node Descriptor. The check is not so much whether the device has a node descriptor, as whether the descriptor has already been read (I think, indeed, by the deCONZ core). Especially for deep sleepers with no or a very short extended poll period on boot/pairing, it can be challenging to read these in time.

You may be right here, but that still doesn't fit the picture. Pretty much the last action in function delayedFastEnddeviceProbe is checkSensorBindingsForAttributeReporting and pretty much at the very beginning, the node descriptor is read (and I do get a response). So I'd conclude it is available, but bindings code says it isn't -> headache.

However, I just coded a workaround for those 2 node descriptor checks on modelID, so my pleasure to introduce another (probably very rare) place for whitelisting.

@ebaauw
Copy link
Collaborator

ebaauw commented Dec 1, 2019

Interestingly, the traffic shows me that for the Develco smoke sensor, mac is 0x80 for option 3 and for the others 0x00. Pretty weired.

Not sure if you saw my EDIT, but I'm seeing something similar for the lumi.ctrl_neutral. The GUI seems to display the last value received, from whatever source.

Pretty much the last action in function delayedFastEnddeviceProbe is checkSensorBindingsForAttributeReporting and pretty much at the very beginning, the node descriptor is read (and I do get a response). So I'd conclude it is available, but bindings code says it isn't -> headache.

It's available somewhere, but I'm not too sure about the data structures here. Both LightNode and SensorNode inherit from Resource and RestNodeBase. RestNodeBase has a reference, node, to a deCONZ::Node, which is provided by the deCONZ core, and defined in node.h of the deconz-dev package. The deCONZ::Node has a reference, nodeDescriptor to a deCONZ::NodeDescriptor.

I would assume/hope, all LightNode and SensorNode instances for the same device would refer to the same deCONZ::Node instance and the same deCONZ::NodeDescriptor instance. However, I'm not sure who (core vs REST API plugin) maintains the references. The core does provide deCONZ::Node::setNodeDescriptor(), but it doesn't seem like the REST API plugin calls this method.

It could be that another device announcement or node descriptor response overrides the data from the association request. Or maybe there's an execution path where checkSensorBindingsForAttributeReporting() is called before the node reference has been set for this SensorNode instance.

so my pleasure to introduce another (probably very rare) place for whitelisting.

LOL

@SwoopX
Copy link
Collaborator Author

SwoopX commented Dec 3, 2019

Interestingly, the traffic shows me that for the Develco smoke sensor, mac is 0x80 for option 3 and for the others 0x00. Pretty weired.

Not sure if you saw my EDIT, but I'm seeing something similar for the lumi.ctrl_neutral. The GUI seems to display the last value received, from whatever source.

I did. However, I don't have any other option to check the mac value other than sniffing. In deconz GUI, the node info pane data is useless since it doen't show anything valid from the device. It looks like it is always the values from the last selected valid node. Probably part of the issue and for me, an indication it's more a deconz core topic.

It's available somewhere, but I'm not too sure about the data structures here. Both LightNode and SensorNode inherit from Resource and RestNodeBase. RestNodeBase has a reference, node, to a deCONZ::Node, which is provided by the deCONZ core, and defined in node.h of the deconz-dev package. The deCONZ::Node has a reference, nodeDescriptor to a deCONZ::NodeDescriptor.

I would assume/hope, all LightNode and SensorNode instances for the same device would refer to the same deCONZ::Node instance and the same deCONZ::NodeDescriptor instance. However, I'm not sure who (core vs REST API plugin) maintains the references. The core does provide deCONZ::Node::setNodeDescriptor(), but it doesn't seem like the REST API plugin calls this method.

Thanks for diggin into that. Might be an idea to place some strategic debug output throughout the code to see, if data was valid at least once or if it gets overwritten somehow.

Or maybe there's an execution path where checkSensorBindingsForAttributeReporting() is called before the node reference has been set for this SensorNode instance.

Also a good idea to check that. Couldn't go any more wrong...

@SwoopX
Copy link
Collaborator Author

SwoopX commented Dec 4, 2019

ok, placed debug output in the code to see if nodeDescriptor() was populated at least once and chose the manufacturer code as attribute. Has been empty all the time and occasionally, to showed the DE code. That doesn't make sense but explains, why any code based on node descriptor checks doesn't work as expected.

@manup could you imagine any condition leading to the observed effect? I'd say deconz core has a hickup there while processing the node descriptor. That also explains why deconz GUI is showing incorrect data in the node info pane.

@stale
Copy link

stale bot commented Apr 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants