Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orderly data delivery - Queue messages for later sending #194

Open
neilh10 opened this issue Nov 9, 2018 · 23 comments
Open

Orderly data delivery - Queue messages for later sending #194

neilh10 opened this issue Nov 9, 2018 · 23 comments

Comments

@neilh10
Copy link
Contributor

neilh10 commented Nov 9, 2018

It would be nice if the sensors readings could have an orderly delivery algorithm to the cloud.

The current method, after writing the sensor readings to the SD card, is a single try to deliver the readings to the cloud, and there is no later retry.

I wonder if anybody has any thoughts on it?
Would it be valuable to have the uSD card as a staged database, and for any (wireless) internet connection, the algorithm is attempt to deliver the next undelivered sensor reading, at a staged rate, until all readings are updated.

I would add a Label: feature enhancement - but I don't seem to be able to do that.

@SRGDamia1 SRGDamia1 changed the title Orderly data delivery Orderly data delivery - Queue messages for later sending Dec 19, 2018
@SRGDamia1
Copy link
Contributor

This would be a nice feature to have. I've re-arranged the way data is sent out, so it might be easier to do this. I've been thinking of adding a "queueData(stream)" fxn to the dataSender class which the Logger can then use to write data to an SD card file.

@neilh10
Copy link
Contributor Author

neilh10 commented Dec 21, 2018

Great.
I was reading on the ThingSpeak - for free or student access they only allow a message every 20seconds.
https://thingspeak.com/prices/thingspeak_home
For a paid minimal "Home" or "Commericial" account the update rate is 1sec.

That is with queued data it would require a provisioned update rate.

I noticed that the Autonomo has a local serial flash - which is more reliable than uSD. It will be interesting to see how its dealt with there.
So would hope there could be a compile CONFIG option for storage routing based on the machine.

As a reference https://bitbucket.org/nuttx/nuttx has a lot of developers with reliable embedded file access systems. They use Kconfig menuconfig to CONFIG_ device platforms.

https://en.wikipedia.org/wiki/Menuconfig
https://opensource.com/article/18/10/kbuild-and-kconfig

@SRGDamia1
Copy link
Contributor

You can upload many data points at once to ThingSpeak via the REST interface. I think that would be better than trying to slowly meter out single points over MQTT.

@neilh10
Copy link
Contributor Author

neilh10 commented Dec 24, 2018

Well season greetings. Just catching up.
I was just suggesting that when doing batch uploads - almost every protocol needs a pacing delay - and I'm not seeing that ThingSpeak differentiates between MQTT publish and HTTP put

In fact - ThingSpeak actually facilitates a bulk upload with MQTT (if I read the code correctly).

https://www.mathworks.com/help/thingspeak/continuously-collect-data-and-bulk-update-a-thingspeak-channel-using-an-arduino-mkr1000-board-or-an-esp8266-board.html

Also, from everything I've seen MQTT is a better protocol for embedded power constrained systems, as it is a light weight protocol. An HTTP put/get has a lot more overheads -

https://1sheeld.com/mqtt-protocol/ - has a list of MQTT brokers

including

https://io.adafruit.com/blog/

https://forums.adafruit.com/viewforum.php?f=56&sid=08c0dec23c4141b75e4604280eb8f0f8

https://en.wikipedia.org/wiki/MQTT

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 8, 2020

Well I'm planning on starting to look at this, I wonder if anybody else has. The architecture of the dataPublisher has changed quite a bit I think. So sharing my thinking

Feature1: queueData() method for reliable delivery.
Feature2: enable multiple sensor readings between delivery attempts.
Objective1: Maintain lowest power useage. Keep writing/reading to the uSD as minimal as possible. Long strings to the uSD effectively use power.
Assumption1: queueData text file structure doesn’t need to be maintained between upgrades.
Assumption2: Simple readable text on queueDataFile valuable for debugging,

Looking over release 0.25.0 and dataPublisher with up to 4 actual senders, I think it needs a queueData() per dataPublisher.
The only practical way I see, is to store the queueData() as a file/uSD per dataPublisher ~ queueDataFileX. Where X is the index of the Publisher.
Then when a specific Publisher is invoked, it reads from the file/uSD and attempts to push all the readings.
Practically speaking there may be a successful delivery of readings, but may not be all the readings.
For a delivery attempt, with no successful delivery, then queueDataFileX file doesn’t change
For a delivery attempt with at least one successful reading delivery, but not all, the undelivered readings are written to a new queueDataFileXy, and queueDataFileX[y-1] is deleted.
As I understand it one risk, is that MMW has never had fast updates pushed to it.
Recently I’ve been experiencing MMW as ‘soggy’, and often doesn’t respond to a POST resulting in a bad response code. If MMW is too slow, the testing can be compared to ThingSpeak.
Risk2: For a large queueData, 100/day, the available battery power may run low. If no power management can cause repeated resets. For solar systems the power can become available with new solar coloumbs, but need to calibrate available coloumbs to communication attempts.

Implementation Research:
LoggerBase.cpp
Logger::logDataAndPublish()
publishDataToRemotes(void)
dataPublisherBase.cpp: int16_t dataPublisher::publishData()

New value: loggerBase.cpp::Logger(..,logingIntervalMultiplier, MaxReadingsToPublish, PublishDelayMs)
~ uint8 logingIntervalMultiplier: the number of LoggingIntervals before a communications event is attempted. Range 1-255, default: 1 For 1: attempt to publish immediately. For 0: never publish.
~ uint8 MaxReadingsToPublishMult 0-255, Default 5. Actual readings published are multiped by loggingIntervalMultipler.
~ uint16 PublishDelayMs: the number of mS between POST attempts, to pace publishing data. Range 0-30seconds. : default 0.

New: dataPublishBase.cpp: queueData()
This writes a simple line to a queueDataFileX, CSV, with current sampling tuple.

New section: int16_t dataPublisher::publishData()
Periodically read the data back, and attempt transmission.
{This where the current classes get sticky. Objective; prototype this very early on to figure out what works}

Background: Decoding wireless network footprint is painful. Collecting data on a remote system that delivers occasional readings, creates more user pain than it solves. For reasonable analysis a coherent time-series is required, especially when there is rain.
Practically speaking, the wireless footprint can be very large, and most of the range has shifting intermittent physical connection. This is weather dependent, including wind direction and fog conditions.
Practically speaking, IHMO for my context, reliable delivery is more important than real time delivery. That is for foggy conditions, if the data gets delayed by 1 week, its OK providing it finally gets reliably posted.
For power reasons and data plans, its better to attempt a connection every couple of hours, and when there is a good conection push it all. The convention that I’m working with is that samples need to be taken every 15minutes.
So I’m looking at a practical sampling every 15minutes, and attempting to push every 1hour.
History: I implemented an algorithim similar to this in 2010 on a mega2560 using a serial flash device with a defined structure, no file system. Watching it over the years the above algorithim has worked well.

@SRGDamia1
Copy link
Contributor

I'm really glad you're thinking about this!

Some disordered thoughts until I can give this some work:

The publishers already hold (unimplemented) fields for sendEveryX and sendOffset which I put in place with the thought of eventually doing the queuing. I'm not sure if they could be kept in the publisher instead of migrating up to loggerBase. On the surface, I'd prefer those fields kept at the publisher level, so each can do their own, but it makes the logic for the logger more complex because it first has to ask the publishers if they need the internet before opening the internet connection.

MonitorMW actually has had rapid-fire posts sent to it; I've used various scripts on my desktop to push in data from other systems. Most of the time, there is no problem with it - at least not more than the problems you already see at 5 min spacing.

If we're queuing/storing data for later, we're not necessarily beholden to the "post a single value" posting structure. MonitorMW doesn't have a way of multi-posting, but ThingSpeak allows a bulk post of data in a single request on their rest API. So I think for the publisher we want to add:
queueData(String filename, Stream* outStream) AND
publishQeuedData(Client* outClient) / publishQeuedData()
with a default of publishing the queue being the same as multiple single publishes in succession.
On the surface, I don't think it will be a problem to mix using MQTT for single values and HTTP/REST for multiple values. The client is just the TCP level anyway.

All SD handling lives with the logger right now. Since the publisher gets and keeps a pointer to the logger anyway, it can just use the createLogFile(filename) and logToSD(filename, rec) functions. Eh, well, we might want to add an overload to logToSD to better use print instead of strings. But anyway, the publisher shouldn't have to worry about creating the file and setting its timestamps.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 14, 2020

Hey thanks for the comments. For some reason I missed the email notification and just seeing them now.
Good to know MMW has been tried with rapid fire pushes/posts.
Thanks for the point out the sendEveryX, but now if there are mutiple publishers I'm assuming for simplicity they will be synced - so the Logger::logDataAndPublish manages the "connecting to the Internet..." ~ which also be good later on if there are other internet activities to co-ordinate (ftp downloads).
So I was thinking the publishers would attempt to publish if the internet was present, and not if they weren't

The Architectural challenge is where the queue fits and how to implement the queue . @SRGDamia1 do you have any thoughts?
Whatever the method of sending multiple rows/results can be looked after by the specific publisher.
Guaranteed delivery has to be by every publisher, so a minimum a "pointer" per publisher has to be maintained.
As far as I can see the method of building a queue is via a standard FIFO file on the uSD flash drive.
For an embedded system resource constrained it could be built as a fixed CSV array on the uSD - but the SDFAT only supports sequential files (not an array type file), so that would be a work on the SDFAT to do that.
Of course the uSD has so much space on it, that files containing duplicated data per publisher don't really matter.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 15, 2020

@SRGDamia1 a quick question ~in the xxxPublisher::publishData() there is publisher specific ways of crafting the readings to dataPublisher::txBuffer[750]. Obviously on transmission there is a header created first and then tail added at the end.
eg EnviroDIYPublisher::publishData(Client* _outClient) there is a lot of careful management of txBuffers to ensure there is enough room for the next set of readings.

WHAT IF the "que strategy" builds the sensor readings into a buffer "serializes" and saves it to the queFile, terminated by LF
That way when the internet is present, for the queFile, read till the LF the pre-built core readings and then process them through xxxPublisher::publishData to send. This assumes LF is never used as part of the data.

The core serialzing of the data readings of each xxxPublisher::publishData(Client* _outClient) could be moved into specific xxxPublisher::SerializeSensorReadings()

Actually since I like incremental development, and verifying the code flows at each step, I would first refactor xxxPublisher::publishData into three sections ~ header, serializeSensorsReadings() and a tail ~ so that it still worked.

Practically speaking txBuffer[750] is a significant resource, what hard limitations are there on buffer space. Any issues I should watch for ~ just seems there must be a story with all the bufferFree() checks. :)
Just checking for any thoughts.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 16, 2020

Well I have proto-typed a serialization/deserialization to uSD of the basic time variant data, with minimal RAM overhead.
MMW: Tried a run of POSTing every 2minutes, Not getting consistent Response Code from an overnight series of POSTs, close to half are 504s not 201s, and doesn't seem to make much difference on actual time attempted.
However download MMW, the SampleNumber indicates they did arrive, so somehow the ACK not being detected.
Next objective is to expand prototype to the periodic mutliple posts, that is sendEveryX
For real world testing I think need to implement sendOffset so not happening close to a quarter hour 00, 15, 30, 45,

@SRGDamia1
Copy link
Contributor

Awesome! I just got back from a vacation, but I'll try to take a look at it soon.

I like the idea of using the offset so as not to pummel the server on the even intervals, but it's tricky because most people would rather have the data logged on the even intervals so that's when the board is awake already to do publishing. Locally within our radio network (which doesn't run on ModularSensors code) @s-hicks2 has programmed random short delays into each board to prevent the data from crashing into the message from another logger at the receiver.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 20, 2020

Hey hope you had a good break. I did some camping threes weeks ago at Lassen Volcanic Park - some gorgeous shallow lakes. :)

Yes I think the "Post Time" offset is a scaleability issue for MMW (and any server). We had one telephone system I worked on, where a supplier lifted all the handsets off the line at midnight (to make fax calls) and it brought the telephone system down. The supplier had to space the faxes. (Gawd remember faxes!!!)

For MS this becomes a difference between the time when the reading is taken, and the "Post Time", which is what serialization through a .TXT file allows.

On Fri I refactored EnviroDIYPublisher::publishData into constituent parts - a header and data section. Then to check I had got it right and it worked, (it didn't to begin with!!!) and characterize MMW testing response, (MMW is also in development) I let it run over the weekend, posting at 2minute intervals over my pretty reliable (but not totally reliable) WiFi.
Most of he POST made into into the MMW database.
On the Mayfly, I extended the Post response TIMER to 30seconds (from 10seconds). Initially all the posts that got a response, where under 2seconds. 1.2 1.6 0.85 0.9 1.1 0.8 1.1 0.8
But then MMW stopped acking the posts even with 30secs, though the POSTs appear to have made it into the data base
https://monitormywatershed.org/sites/tu_rc_test05/ - its just BatV, SequenceNo, RSSI
Dbg data from Mayfly for 19th - https://drive.google.com/file/d/14Q1aS0tJ3uv1C4Q4pyYWa939_haV09KU/view?usp=sharing

So working with the reality of soggy MMW;

I'm thinking of an "Orderly data delivery" algorithm, primarily with the ability to sendEveryX, and with SendOffset as follows
On every reading, serialize the data into a single READINGS.TXT file.
This is CSV, with a <readings time+data > ~ looks like this with , seq#, Vbat, RSSI
1594827720,1,4.412,-31
1594827840,2,4.548,-32
1594827960,3,4.548,-35

After every SendEveryX (future and on SendOffset), connect to internet and on

EnviroDIYPublisher::publishData,
POST the READINGS.TXT data in the order taken, to MMW (2second timeout).
If a POST is not acknowledged store it QUEx.TXT file, (QUEx.TXT unique to xxxPublisher)
ignore POST result and attempt all READINGS.TXT
At completion of READINGS.TXT, if ack received on last POST, then attempt to re-send readings in QUEx.txt from beginning ~ after 1st POST with no ACK, then stop posting, and finish this xxxPublisher.

Hopefully for the future, MMW acks on the 1st post, and if the line goes down, it will be able to handle a surge of repeats (100 per day). That might help test that MMW subsystem.
For the current reality, MMW gets the POST as it currently does, but no large overhead with repeated ACKs.

Unfortunately, for a poor line condition (as in poor quality wireless line), I don't think there is a way of differentiating between no response from MMW, and rejection of the post from MMW due to some issue. TCP/IP should handle it, but TCP/IP often isn't perfect. All that can happen in this case, is when the MMW starts ACKing, it will run through the posts.

So I'm also thinking I need a debug file POSTLOG.TXT, which will record all the POST attempts and responses as they happen, in whatever order they happen. Fortunately the SD cards are 16G :)

Any thoughts.?

@neilh10
Copy link
Contributor Author

neilh10 commented Aug 17, 2020

I have this successfully working in my fork. I'm happy to offer it back to the main repo. Who would decide if it would be useful for EnviroDIY?

I've only implemented it for EnviroDIYpublisher, at this point, as I have a unit to deploy, and want to get it stable for that.

The accelerated testing has been over WiFi and the more realistic beta testing on a unit with Verizon. Both results are excellent and I can post the TTY debug/log if interested.

The formerly poor response on MMW was great for testing!!. Now MMW is responding well (hurrah), I have to simulate network failure by using a WiFi router that I can turn off.

My data for POSTing over WiFi to MMW, timeouts are usually under 1sec, typically 0.5seconds. For verizon, I have a timeout set to 7sec. Practically a response is usually received in 5seconds, but sometime in 1.5seconds an occasionally

The implementation is as described above; restated here. Multiple messages collected as "sendEveryX" (option1..8).
These are written/serialized to a file on the SD as READINGS.txt.
Then on connecting to the internet, an attempt is made to POST every item in READINGS.TXT. If thereisn't a successful ACK '201' within the timeout, its written/serialized to QUEx.txt (where x is the publisher number, if the specific publishers supports the QUE)

After READINGS.TXT has been emptied (POSTed) AND the last POST was a 201, THEN it checks for any lines in QUEx.TXT and if there is sends them until it doesn't receive a 201.

To summarize, the user view through MMW, if there is an adequate internet connection, they will see the latest readings. (get confidence in the unit active). Then it will attempt to push historical view.
I have tested with turning the WiFi router off , letting it collect 50messages, and then turn the WiFi router on and watching it POST them.

Whats missing or potential problems. If there has been a loss of internet for some time (weeks.. months), at a 15minute sampling interval. that is 96 readings a day, or 672 in a week. So it may be a lot of power draw on the Mayfly to send all these at once.
My plan is to put an algorithm in place to limit the number sent. However if the Mayfly resets, due to lack of power, PROVIDING the file system or file isn't corrupted, it will gracefully recover and attempt on the next connection......

@SRGDamia1
Copy link
Contributor

I need to get this pulled in. I'll try to start looking at it soon.

@neilh10
Copy link
Contributor Author

neilh10 commented Nov 23, 2020

I could put a PR for this, as I've been merging to 'master' whenever there is a new release.

@SRGDamia1
Copy link
Contributor

Please, do!

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 10, 2023

This is restatement of the intent and architecture solution to this issue
Current Architecture as of 0.34.0 – once a reading is taken, and stored locally in the .csv file, there is an attempt to create a communications channel (LTE, WiFi) to an endpoint (MonitorMyWatershed) and then deliver that reading.
There is no check for succesfull delivery. Any checks for success need to be manual. Eg using BootNet (walk up collect the uSD) and take the .csv and uploading to endpoint.

This Change Description:
This PR supports single and batch mode reliable delivery of readings to destination end point MonitorMyWatershed (MMW).
The batch mode means that several readings can be taken, and then a connection to MMW established and all readings uploaded on the same connection.

Once a sensor reading has been taken the values are stored in a text file on uSD RDELAY.txt. Subsequent readings are appended to the file.
When a communications channel is established, RDELAY.txt is read, the JSON string constructed and the POST is performed to the end point MMW. If the result is an HTTP response 201, the reading is treated as delivered. For non 201 response, the reading is written to a file QUE0.txt.
When all lines have been read from RDELAY.txt, POST attempted, and the last HTTP response is 201, then if a QUE0.txt is present, it is renamed to a temporary name and the lines from it are read. The lines are read sequentially and POST attempted until a non 201 is received. Then the current reading and all subsequent readings are written to file QUE0.txt for the next pass.

There are typically two areas of soft failure – that is requires a software algorithim to recover from the failure – communications channel failure and apparent end point delivery failure.

A more detailed description of the above algorithim:
a) Take a reading from the sensor and write to RDELAY.txt
b) For Communications channel failure (can’t detect WiFi etc) treated the same as batch mode – the reading is appended to a local file RDELAY.txt , .
c) When time to establish a connection to the endpoint and successful establishment of connection to the endpoint made, the readings in RDELAY.txt are read in and then an attempt to POST to the endpoint. If the endpoint responds with a success, HTTP response 201, then the reading is discarded. If the endpoint for any reason doesn’t send a 201, then the reading is written to local file QUE0.txt
d) When all RDELAY0.txt rows have been read and POST attempted the file is deleted. For a responsive endpoint, this ensures the latest readings are received, communicating the latest readings which can include hardware state (eg battery reading).
e) If the last POST is a success, then the QUE0.txt is renamed to a temp value and opened,
f) The following is repeated while there are readings in the file, and the maximum number of POSTS are not exceeded, read line and attempt POST to endpoint.
g) If the number of POSTs so far has been exceeded or POST failure, then append this reading and all subsequent readings to QUE0.txt
For debugging purposes, append status of every communications attempt and POST status to a file DBGyymm.log

In Modular Sensors design, there are
a) configuration options, associated with logger – collecting data,
b) and configuration options associated with a network, and the specific data endpoint delivery.

For system integration, options are chosen to perform aggressive testing of the implementation- they are not values expected to be used in normal logging.
The test systems use a Mayfly 1.x with a wifi (Digi or EPS32) and internal sensors.

Configuration test opts
Logging options
LOGGING_INTERVAL_MINUTES=2 ; Logger readings
COLLECT_READINGS=4 ; Number of readings to collect before send 0to30

SEND_OFFSET_MIN=0 ; Delay from collect readings to send 0-LOGGING_INTERVAL_MINUTES- 1
POST_MAX_NUM =10 ; Max items to send every connection 10-200
TIMER_POST_TOUT_MS=5000; Gateway Timeout (ms)
TIMER_POST_PACE_MS=1000 ; Between each POST on a link

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 11, 2023

Oh a couple of issues I forgot to mention.
a) Obviously with multiple POSTs there is a need to check the battery power when called with logDataAndPubReliably() . The above submission has a hook for a call to a "Battery Management System"_bat_handler_atl(LB_PWR_USEABLE_REQ) - but I haven't provided any demo code. Its all in my branch under examples\Tu_xx01.
I suspect this is more than what anybody wants to consider ~ BatterryManagementSystem V*A demand needs to considered when dispatching events.

Know issues
b) For any changes in the project with the number of sensor0s, any left over files on the uSD, QUE0.txt and RDELAY.txt need to be deleted. On any reset or startup, data in the temporary files is assumed to need to be sent to MMW.
it might be possible to automatically analyze the number of sensors on startup and delete these files if inaccurate but I haven't done it.

c) If the time isn't valid on boot, either in the RTC or connection with the NTP then subsequent POST using older time, may have undefined effect on MMW (timedb compression). Technically if ModularSensors hasn't got a valid time, it shouldn't be able to startup until the time is valid. Something like if there isn't enough power it shouldn't start.

@neilh10
Copy link
Contributor Author

neilh10 commented Aug 31, 2023

a follow up, if/when it is included in (develop), its treated as new beta feature that is not automatically invoked.

The feature can be announced, and it needs to be specifically turned on to be tried.

The feature can then be tested as part of (develop) - a pretty normal process for a large change.
There maybe some further system protection features that are needed eg a) no startup if RTC hasn't been initialized, b) delete old files if configuration changed.

When turned on in setup(), there are a number of configurable communication retry parameters.

The defaults may be changed, however at compile they will not cause any warnings.
My suggestion for the hard coded defaults are
LOGGING_INTERVAL_MINUTES=15 ; Logger readings
COLLECT_READINGS=4 ; Number of readings to collect before send 0to30

SEND_OFFSET_MIN=5; Delay from collect readings to send 0-LOGGING_INTERVAL_MINUTES- 1
POST_MAX_NUM =50 ; Max items to send every connection 10-200
TIMER_POST_TOUT_MS=15000; Gateway Timeout (ms)
TIMER_POST_PACE_MS=1000 ; Between each POST on a link

@neilh10
Copy link
Contributor Author

neilh10 commented Jan 23, 2024

ping - just wondering if any visibility of where this might be going.

@aufdenkampe
Copy link
Member

aufdenkampe commented Jan 24, 2024

@neilh10, thanks for the ping. With some recent, modest funding, we've queued the following into our v0.18 release. EnviroDIY performance improvements Milestone for Monitor My Watershed:

Our thinking is those performance enhancements will lay the foundation for your proposed batch upload feature, because they should:

  • mitigate any potential issue with the database server getting overloaded
  • offer an efficient option to send multiple data values in a single payload
  • provide near immediate "202 Accepted" reponses from the AWS Simple Queue Service that would then let the device quickly turn off the radio and know that the value doesn't need to be resent.

Our current plan is to get all that onto staging by the end of March.

So, once that is done, we'll be able to fully test these modifications to Modular Sensors:

That will in turn be a foundation for your batch upload PR, which we could either refactor or reimplement to the new batch transmission capabilty.

@neilh10
Copy link
Contributor Author

neilh10 commented Jan 24, 2024

Thanks for the update. I guess the PR I created last July will be throw away.

I guess its got to be scale able for the server - in my experience always a big issue - I tried framing this back in 2020 WikiWatershed/monitor-my-watershed#485

An observation as identified with a "202 Accepted" its not a "[201 Created]"(https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/201).

At an OSI protocol level of serializing readings and transferring them across a medium the "202 Accepted" is a step down from a 201 and not a guaranteed delivery. It will put more onus on the server to be reliable.

@aufdenkampe
Copy link
Member

aufdenkampe commented Jan 24, 2024

@neilh10, I can recognize how it might be disappointing that it has taken so long for us to be ready to be ready to consider adopting your work from your PR last July. I wouldn't frame it as throwing away your work, however, as we should be able to adapt it, once the server can handle it.

I agree that a 202 is a step down from a 201, but we have the collective ability to decide what to do with that. We can deliver a 202 instantly (and the speed of a ping), or we can wait for the post request to be fully processed and deliver a 201. With PR #453 and its promise of a 90x speedup, we might still get a 201 in <1 second, so that could work just fine. I would be interested in your feedback on how to best handle that.

@neilh10
Copy link
Contributor Author

neilh10 commented Jan 24, 2024

@aufdenkampe I understand project management requires tradeoffs and you're kind to suggest there will be some use for it - though realistically, there is underlying code that will need reworking - I have looked at the code for a ringbuffer in ram temporary storage for readings, I would expect this PR to be throwaway, but happy if it can be used.
If I merge to it I will be maintaining my queue on SD as it has known reliability.

IMHO Reliability is poorly understood - and I just don't see the bar being defined that needs to implement reliability as a core feature. WikiWatershed/monitor-my-watershed#485
This simple concept has taken 3.5years to percolate and could have been implemented from the beginning.
For naive end-users its a shock to be educated that readings are lost- and they have to figure out which ones are lost. I've watched a number of people I work with. They walk up to the system and get the readings off it, why doesn't that happen when its downloaded from the internet!

Reliability is driven by a hard number of how often a reading can be lost and not delivered - OSI network layer CS101
Wireless transmission degrades with larger packet sizes. Radio Engineering 101.
LoRa protocol is the value in engineering the packet to have better delivery, and it requires strict packet packing.

Reliability in code is often making an improvement in one area and characterizing it to ensure the gains are real and understood. I think that is being said here as well WikiWatershed/monitor-my-watershed#688 (comment)

Speeding up the server database - and it appears there are some easy low hanging fruit according to - WikiWatershed/monitor-my-watershed#674

This could be tested with a suite of POSTS from a local internet test machine (non Mayfly) - makes it much simpler to characterize and issues not pushed on to the naive end-user.

The improvements on the server could then be made available with this thourghly tested "Orderly data delivery" working for the last 3 years, and the gains on the server are real and accessible to everyone (who upgrades)

Then the #453 implemented - but actually it can be implemented with out a ring buffer in memory because most of the code base is present in this PR which implements a queue on SD, is reliable through other Mayfly conditions, and is not limited by ram size (ring buffer in memory). I would have no problem create a packed JSON string per specification. I even offered to do it, and do some testing if the server location was identified. The offer wasn't accepted.

For power usage, real numbers I'm seeing is that making an LTE connection takes 25+seconds - depending on wireless reliability. Then there is the POSTing to the server, and typically that is 4-20seconds over LTE, so one second would be admirable. For powering on a normal system though there is still the LTE connect time of 25+ seconds - so if the server becomes more reliable AND allows the server to scale and saves a little bit of power. For the end-user (who upgrades) and now it just works. When packed JSON is added, I would think there is a little more improvement, and more importantly the server can scale to handle more users.

I have one system that is on the edge of the wireless range - and according to some of the notifications it is probably only getting one in 10 connections through or connections to the server- so all the readings are queued at each "no connection event". When the wind aligns (or maybe its just the server becomes reliable, hard to figure out) then it gets a bunch of POSTS through. Its fantastic to see the node appearing to just work - and with winter it is only doing it when there is power available.
However with a packed JSON, a larger packet is less likely to get delivered, so site specific optimizations might be needed.
Site specific optimizations are needed to extend the reach of wireless systems, but wouldn't be the default.

Overall a POST with a UUID structure are costly when the wireless signal is on the edge.
MQTT is a light weight packet (no UUIDs) and has bidirectional capability. The value of the UUIDs is not explained anywhere I can see of, and is showing to be a costly part of the servers processing capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants