- Author(s): @wavesoft, @dmamalis
- Start Date: 2021-07-28
- Category: Technical
- Original HIP PR: helium#249
- Tracking Issue: helium#250
Large-scale IoT network operators are usually collecting analytics information from their gateways in order to check the status of their network and diagnose faults the moment they happen.
This is achieved by collecting RF meta-data from every packet received or transmitted and feeding it into an analysis stack for further processing.
Since such task typically involves tapping into the packet stream, a Helium Hotspot owoner might be reluctant to allow a third-party tool to tamper with the stream, for security considerations.
Therefore, we are proposing a mechanism that would enable the extraction of packet meta-data without disclosing critical information that could be used for malicious purposes.
Helium wants to deliver a fair, robust and secure solution to all of it's stakeholders. In order to achieve this, it requires full control over the components involved in order to minimize the chances of someone abusing the network.
This means that on an official helium gateway it's normally impossible to collect RF meta-data since it requires tampering with the most critical component: the packet forwarder. Therefore, we are proposing this HIP as the means for enabling RF meta-data collection without having to tamper with the internal components.
- Professional network operators with big Helium network deployments
- Hotspot owners that want to diagnose reception issues
- Local helium communities that want to improve their coverage and overall network quality
- Hotspot manufacturers that want to provide diagnostic tools to their customers
Every time a packet is received, the packet forwarder includes useful information regarding it's quality. You can read the complete list of fields returned for every packet sent and received on Semtech's packet_forwarder website but a short summary is the following:
Every time a packet is received (or scheduled for transmission) the following information are transferred:
- RF Quality (RSSI/SNR)
- RF Channel and Frequency
- RF Modulation and Encoding information
- GPS Timestamp (When the gateway is equipped with a GPS receiver)
- The packet data payload
In addition, the gateway is periodically pushing summary statistics that includes:
- Number of packets sent/received
- Number of packets rejected transmission or received with bad CRC
- Percentage of upstream datagrams that were acknowledged
All this information are obviously needed by the LoRaWAN core to function correctly, but they can also be proven helpful when you are trying to diagnose an issue in your network. For example: if you measure the average RSSI of the packets received over time and you see a degradation, then you might be having an issue with our antenna.
As you can already see, most of the information above are just meta-data and announcing them to a
third-party component will have negligible security side-effects. However, the contents of the
data
payload might contain sensitive information that must not be shared.
For example, consider a case where data
holds a PoC payload: if this message gets shared with a
third-party, it could be maliciously used to simulate more witnesses than in reality.
Therefore, we should consider the data
payload Unsafe and replace it with another representation
that is safe, but still holds the valuable meta-data information needed. For example:
- Payload length in bytes
- Payload checksum (eg. ADLER32)
- The LoRaWAN MAC header bytes
And the justification is the following:
- The Payload Length is enough for identifying cases where wrong spreading factors are used.
- The Payload Checksum is used to de-duplciate the same packet when received by multiple gateways in a short period of time. Note that it does not need to be cryptographically secure (eg. SHA sums) and simpler checksums, with smaller impact on the processing time could be used. The ADLER32 is suggested as a good trade-off between speed, memory usage and randomness of the result.
- The LoRaWAN MAC header holds useful information to diagnose LoRaWAN issues and must be included intact into the meta-data. This header is found in the first 8 bytes of the payload and it does not hold any application-level information (eg. the contents of the PoC message).
We are proposing the introduction of an Analytics Side-Channel that can be used by the stakeholders to consume analytics of the incoming messages in a secure and reliable manner.
+------------------+ Semtech UDP +----------------+
| Packet Forwarder | ------------> | Hotspot Client | ----> Helium Network
+------------------+ +----------------+
| Analytics Side-Channel
v
. . . . . . . . . . .
. Analytics Client .
. . . . . . . . . . .
The proposed solution should:
- Induce minimum overhead to the client
- Be easy to integrate into the client codebase
- Allow existing solutions to be easily adapted to the new interface
Considering that:
- The Analytics Side-Channel should be implemented using a connection-less protocol such as UDP, since we don't care about reliability and back-pressure from the consumer must not affect the producer.
- The amount of processing power must be reduced to the minimum, therefore an ADLER32 checksum is recommended for the payload instead of the computational-intensive cryptographic hashes.
- The serialization overhead of the message should be kept to minimum, in which case Google Protobuf can be used. However, since interoperatiblity with existing solutions might be a concern, JSON is a valid trade-off.
- Since this is an opt-in feature, it should be enabled via an external flag (eg. using an environment variable).
- To further reduce the processing demand, the analytics data are NOT processed inside the Hotspot client. Instead they are relayed to a third-party Analytics Client. This could be a simple proxy, or a more elaborate statistics aggregator. The implementation details are not important as part of this HIP.
- Since the analytics meta-data are stripped-off of any risky information, an analytics datagram could be sent outside of the gateway even without encryption. This allows us to consider that the Analytics Client is either a local OR a remote process.
The analytics side-channel emits 3 different kinds of messages. Each message is sent as a UDP datagram, encoded with the aggreed serialization format (JSON or Protobuf), and always have exactly one receipient.
The fields present in these messages are very similar to the fields defined in the Semtech UDP Forwarder PROTOCOL.TXT, but they are adapted for faster consumption and for the security concerns explained above.
Note that the compact naming of the fields can be used when encoding the analytics data with JSON, in order to keep the overall message size to minimum, and therefore fit in a single datagram.
An uplink message is sent every time a packet is received from the packet forwarder. It contains the following fields:
# | Compact Name | Verbose Name | Type | Description |
---|---|---|---|---|
1 | tmms |
timeGps |
int64 |
The UNIX timestamp (in milliseconds) when the message arrived in the concentrator. |
2 | gpsu |
timeGpsUs |
uint16 |
The microseconds fraction of the unix timestamp above as a number between 0 - 999. |
3 | tmst |
timeFinished |
int64 |
The UNIX timestamp (in milliseconds) of the local system when the message was received. |
4 | freq |
frequency |
float |
RX central frequency in MHz (Hz precision). |
5 | chan |
ifChannel |
uint8 |
Concentrator "IF" channel used for RX. |
6 | rfch |
rfChain |
uint8 |
Concentrator "RF chain" used for RX. |
7 | stat |
crcStatus |
enum |
CRC status: "OK", "Fail" or "NoCRC". |
8 | modu |
modulation |
enum |
Modulation identifier ("LORA" or "FSK"). |
9 | datr |
fskDataRate |
uint32 |
FSK datarate in bits per second. Used only when modulation is "FSK". |
10 | drls |
loraSf |
enum |
Spreading factor component of LoRa DataRate (eg. "SF12"). Used only when modulation is "LORA". |
11 | drlb |
loraBandwidth |
enum |
Bandwidth component of LoRa DataRate (eg. "BW500"). Used only when modulation is "LORA". |
12 | codr |
loraCodingRate |
enum |
LoRa ECC coding rate identifier. |
13 | rssi |
rssi |
float |
RSSI in dBm. |
14 | rssi |
rssi |
float |
Lora SNR ratio in dB (signed float, 0.1 dB precision). |
15 | size |
size |
int8 |
RF packet payload size in bytes. |
16 | data |
data |
int8 |
The 8 first bytes of the data payload (Holding the LoRaWAN MAC header). |
17 | csum |
dataChecksum |
uint32 |
The ADLER32 checksum of the entire RF packet payload. |
An downlink message is sent every time the system has just pushed a downlink message to the packet forwarder. It contains the following fields:
# | Compact Name | Verbose Name | Type | Description |
---|---|---|---|---|
1 | tmms |
timeGps |
int64 |
The UNIX timestamp (in milliseconds) when the message should be sent (when set to '0' means "immediately"). |
2 | tmst |
timeWall |
int64 |
The UNIX timestamp (in milliseconds) of the local system when the message should be sent (when set to '0' means "immediately"). |
3 | freq |
frequency |
float |
Tx central frequency in MHz (Hz precision). |
4 | rfch |
rfChain |
uint8 |
Concentrator "RF chain" used for TX. |
5 | powe |
txPower |
float |
TX output power in dBm. |
6 | ncrc |
noCRC |
bool |
If true, disable the CRC of the physical layer. |
7 | modu |
modulation |
enum |
Modulation identifier ("LORA" or "FSK"). |
8 | datr |
fskDataRate |
uint32 |
FSK datarate in bits per second. Used only when modulation is "FSK". |
9 | fdev |
fskFreqDev |
uint16 |
FSK frequency deviation in Hz. |
10 | drls |
loraSf |
enum |
Spreading factor component of LoRa DataRate (eg. "SF12"). Used only when modulation is "LORA". |
11 | drlb |
loraBandwidth |
enum |
Bandwidth component of LoRa DataRate (eg. "BW500"). Used only when modulation is "LORA". |
12 | codr |
loraCodingRate |
enum |
LoRa ECC coding rate identifier. |
13 | ipol |
inversePolarity |
bool |
Lora modulation polarization inversion. |
14 | prea |
preamble |
number |
RF preamble size. |
15 | size |
size |
int8 |
RF packet payload size in bytes. |
16 | data |
data |
int8 |
The 8 first bytes of the data payload (Holding the LoRaWAN MAC header). |
17 | csum |
dataChecksum |
uint32 |
The ADLER32 checksum of the entire RF packet payload. |
A statistics message is sent every time the respective stat
message is received from the packet
forwarder. This message is blindly forwarded without further processing and it has the following
fields.
(Note that the statistic counters reset to zero every time a stat message is sent)
# | Compact Name | Verbose Name | Type | Description |
---|---|---|---|---|
1 | addr |
hotspotAddress |
byte[] |
The address of the hotspot. |
2 | time |
timeWall |
int64 |
The UNIX timestamp (in milliseconds) of the local system. |
3 | lati |
gpsLatitude |
float |
GPS latitude of the gateway in degree (float, N is +). |
4 | long |
gpsLongitude |
float |
GPS latitude of the gateway in degree (float, E is +) |
5 | alti |
gpsAltitude |
float |
GPS altitude of the gateway in meters. |
6 | rxnb |
packetsRx |
uint32 |
Number of radio packets received. |
7 | rxok |
packetsRxOk |
uint32 |
Number of radio packets received with a valid PHY CRC. |
8 | rxfw |
packetsRxFw |
uint32 |
Number of radio packets forwarded. |
9 | ackr |
ackRatio |
float |
Percentage of upstream datagrams that were acknowledged. |
10 | dwnb |
packetsTxReq |
float |
Number of downlink datagrams received. |
11 | txnb |
packetsTx |
float |
Number of packets emitted. |
Alice has a DIY hostspot built using a packet forwarder and a light client. She is having reception issues with her devices and she wants to debug.
She has built her own log processing stack that runs on the cloud and she wants to feed the packet analytics down to it.
- She then adjust the environment variables for
gateway-rs
and setsHELIUM_ANALYTICS_CLIENT="my.cloud.service:12345
. - Once the client restarts, she starts seeing data.
Bob, a professional LoRaWAN network operator, has deployed 1,000 Helium Hotspots in an area and he wants to make sure his services are reliable. He is using production miners bought from one of the official suppliers.
He is already using a centralized analytics aggregation system on his deployment that already consumes data in Semtech UDP format.
We are assuming that once this HIP has landed, the gateway manufacturer will enable a new option on
their UI called Helium Analyltics Client
.
- Bob simply goes to the UI configures the helium analytics client to point to the cloud infrastructure that he is already using.
- He will already receive 80% of the interesting data from day 0 and he will only have to do minor adjustments to the protocol in order to accommodate the new fields.
- We are kind of duplicating the stream of incoming data, but at the same time we cannot really forward them without processing, because we are risking exposing critical information.
The most obvious and straightforward way of solving this issue is by introducing a man-in-the-middle UDP forwarder between the packet forwarder and the Hotspot Client (Light or Full) as seen below:
+------------------+ Semtech UDP +-----------------+ Semtech UDP +----------------+
| Packet Forwarder | -------------> | Analytics Proxy | -------------> | Hotspot Client | --> Helium Network
+------------------+ +-----------------+ +----------------+
|
v
RF Meta-Data Stream
This solution is trivial to integrate and requires no further modification to the Helium core components. However it requires that the Analytics Proxy is a trusted component and it does not disclose sensitive information to third parties.
- We need to decide weather we go with JSON (and therefore creating a backwards-compatible interface, similar to the semtech UDP packet itself), or we go with Protobuf, and therefore breaking any existing solution.
- Some discussion might be needed to further clean-up the fields in the analytics protocol. More specifically, if there is any smart alternative to encode the different fields for LORA or FSK encoding.
We are not expecting any considerable impact on the deployment once this solution is applied. Both the processing power and the overall size footprint should be left relataively intact.
Plus, this is an opt-in feature so it wan't affect the user experience by default.
Any stakeholder reporting a successful usage of this system to diagnose a problem they are having.