-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orderly data delivery - Queue messages for later sending #194
Comments
This would be a nice feature to have. I've re-arranged the way data is sent out, so it might be easier to do this. I've been thinking of adding a "queueData(stream)" fxn to the dataSender class which the Logger can then use to write data to an SD card file. |
Great. That is with queued data it would require a provisioned update rate. I noticed that the Autonomo has a local serial flash - which is more reliable than uSD. It will be interesting to see how its dealt with there. As a reference https://bitbucket.org/nuttx/nuttx has a lot of developers with reliable embedded file access systems. They use Kconfig menuconfig to CONFIG_ device platforms. https://en.wikipedia.org/wiki/Menuconfig |
You can upload many data points at once to ThingSpeak via the REST interface. I think that would be better than trying to slowly meter out single points over MQTT. |
Well season greetings. Just catching up. In fact - ThingSpeak actually facilitates a bulk upload with MQTT (if I read the code correctly). Also, from everything I've seen MQTT is a better protocol for embedded power constrained systems, as it is a light weight protocol. An HTTP put/get has a lot more overheads - https://1sheeld.com/mqtt-protocol/ - has a list of MQTT brokers including https://forums.adafruit.com/viewforum.php?f=56&sid=08c0dec23c4141b75e4604280eb8f0f8 |
Well I'm planning on starting to look at this, I wonder if anybody else has. The architecture of the dataPublisher has changed quite a bit I think. So sharing my thinking Feature1: queueData() method for reliable delivery. Looking over release 0.25.0 and dataPublisher with up to 4 actual senders, I think it needs a queueData() per dataPublisher. Implementation Research: New value: loggerBase.cpp::Logger(..,logingIntervalMultiplier, MaxReadingsToPublish, PublishDelayMs) New: dataPublishBase.cpp: queueData() New section: int16_t dataPublisher::publishData() Background: Decoding wireless network footprint is painful. Collecting data on a remote system that delivers occasional readings, creates more user pain than it solves. For reasonable analysis a coherent time-series is required, especially when there is rain. |
I'm really glad you're thinking about this! Some disordered thoughts until I can give this some work: The publishers already hold (unimplemented) fields for MonitorMW actually has had rapid-fire posts sent to it; I've used various scripts on my desktop to push in data from other systems. Most of the time, there is no problem with it - at least not more than the problems you already see at 5 min spacing. If we're queuing/storing data for later, we're not necessarily beholden to the "post a single value" posting structure. MonitorMW doesn't have a way of multi-posting, but ThingSpeak allows a bulk post of data in a single request on their rest API. So I think for the publisher we want to add: All SD handling lives with the logger right now. Since the publisher gets and keeps a pointer to the logger anyway, it can just use the createLogFile(filename) and logToSD(filename, rec) functions. Eh, well, we might want to add an overload to logToSD to better use print instead of strings. But anyway, the publisher shouldn't have to worry about creating the file and setting its timestamps. |
Hey thanks for the comments. For some reason I missed the email notification and just seeing them now. The Architectural challenge is where the queue fits and how to implement the queue . @SRGDamia1 do you have any thoughts? |
@SRGDamia1 a quick question ~in the xxxPublisher::publishData() there is publisher specific ways of crafting the readings to dataPublisher::txBuffer[750]. Obviously on transmission there is a header created first and then tail added at the end. WHAT IF the "que strategy" builds the sensor readings into a buffer "serializes" and saves it to the queFile, terminated by LF The core serialzing of the data readings of each xxxPublisher::publishData(Client* _outClient) could be moved into specific xxxPublisher::SerializeSensorReadings() Actually since I like incremental development, and verifying the code flows at each step, I would first refactor xxxPublisher::publishData into three sections ~ header, serializeSensorsReadings() and a tail ~ so that it still worked. Practically speaking txBuffer[750] is a significant resource, what hard limitations are there on buffer space. Any issues I should watch for ~ just seems there must be a story with all the bufferFree() checks. :) |
Well I have proto-typed a serialization/deserialization to uSD of the basic time variant data, with minimal RAM overhead. |
Awesome! I just got back from a vacation, but I'll try to take a look at it soon. I like the idea of using the offset so as not to pummel the server on the even intervals, but it's tricky because most people would rather have the data logged on the even intervals so that's when the board is awake already to do publishing. Locally within our radio network (which doesn't run on ModularSensors code) @s-hicks2 has programmed random short delays into each board to prevent the data from crashing into the message from another logger at the receiver. |
Hey hope you had a good break. I did some camping threes weeks ago at Lassen Volcanic Park - some gorgeous shallow lakes. :) Yes I think the "Post Time" offset is a scaleability issue for MMW (and any server). We had one telephone system I worked on, where a supplier lifted all the handsets off the line at midnight (to make fax calls) and it brought the telephone system down. The supplier had to space the faxes. (Gawd remember faxes!!!) For MS this becomes a difference between the time when the reading is taken, and the "Post Time", which is what serialization through a .TXT file allows. On Fri I refactored EnviroDIYPublisher::publishData into constituent parts - a header and data section. Then to check I had got it right and it worked, (it didn't to begin with!!!) and characterize MMW testing response, (MMW is also in development) I let it run over the weekend, posting at 2minute intervals over my pretty reliable (but not totally reliable) WiFi. So working with the reality of soggy MMW; I'm thinking of an "Orderly data delivery" algorithm, primarily with the ability to sendEveryX, and with SendOffset as follows After every SendEveryX (future and on SendOffset), connect to internet and on EnviroDIYPublisher::publishData, Hopefully for the future, MMW acks on the 1st post, and if the line goes down, it will be able to handle a surge of repeats (100 per day). That might help test that MMW subsystem. Unfortunately, for a poor line condition (as in poor quality wireless line), I don't think there is a way of differentiating between no response from MMW, and rejection of the post from MMW due to some issue. TCP/IP should handle it, but TCP/IP often isn't perfect. All that can happen in this case, is when the MMW starts ACKing, it will run through the posts. So I'm also thinking I need a debug file POSTLOG.TXT, which will record all the POST attempts and responses as they happen, in whatever order they happen. Fortunately the SD cards are 16G :) Any thoughts.? |
I have this successfully working in my fork. I'm happy to offer it back to the main repo. Who would decide if it would be useful for EnviroDIY? I've only implemented it for EnviroDIYpublisher, at this point, as I have a unit to deploy, and want to get it stable for that. The accelerated testing has been over WiFi and the more realistic beta testing on a unit with Verizon. Both results are excellent and I can post the TTY debug/log if interested. The formerly poor response on MMW was great for testing!!. Now MMW is responding well (hurrah), I have to simulate network failure by using a WiFi router that I can turn off. My data for POSTing over WiFi to MMW, timeouts are usually under 1sec, typically 0.5seconds. For verizon, I have a timeout set to 7sec. Practically a response is usually received in 5seconds, but sometime in 1.5seconds an occasionally The implementation is as described above; restated here. Multiple messages collected as "sendEveryX" (option1..8). After READINGS.TXT has been emptied (POSTed) AND the last POST was a 201, THEN it checks for any lines in QUEx.TXT and if there is sends them until it doesn't receive a 201. To summarize, the user view through MMW, if there is an adequate internet connection, they will see the latest readings. (get confidence in the unit active). Then it will attempt to push historical view. Whats missing or potential problems. If there has been a loss of internet for some time (weeks.. months), at a 15minute sampling interval. that is 96 readings a day, or 672 in a week. So it may be a lot of power draw on the Mayfly to send all these at once. |
I need to get this pulled in. I'll try to start looking at it soon. |
I could put a PR for this, as I've been merging to 'master' whenever there is a new release. |
Please, do! |
This is restatement of the intent and architecture solution to this issue This Change Description: Once a sensor reading has been taken the values are stored in a text file on uSD RDELAY.txt. Subsequent readings are appended to the file. There are typically two areas of soft failure – that is requires a software algorithim to recover from the failure – communications channel failure and apparent end point delivery failure. A more detailed description of the above algorithim: In Modular Sensors design, there are For system integration, options are chosen to perform aggressive testing of the implementation- they are not values expected to be used in normal logging. Configuration test opts SEND_OFFSET_MIN=0 ; Delay from collect readings to send 0-LOGGING_INTERVAL_MINUTES- 1 |
Oh a couple of issues I forgot to mention. Know issues c) If the time isn't valid on boot, either in the RTC or connection with the NTP then subsequent POST using older time, may have undefined effect on MMW (timedb compression). Technically if ModularSensors hasn't got a valid time, it shouldn't be able to startup until the time is valid. Something like if there isn't enough power it shouldn't start. |
a follow up, if/when it is included in (develop), its treated as new beta feature that is not automatically invoked. The feature can be announced, and it needs to be specifically turned on to be tried. The feature can then be tested as part of (develop) - a pretty normal process for a large change. When turned on in setup(), there are a number of configurable communication retry parameters. The defaults may be changed, however at compile they will not cause any warnings. SEND_OFFSET_MIN=5; Delay from collect readings to send 0-LOGGING_INTERVAL_MINUTES- 1 |
ping - just wondering if any visibility of where this might be going. |
@neilh10, thanks for the ping. With some recent, modest funding, we've queued the following into our v0.18 release. EnviroDIY performance improvements Milestone for Monitor My Watershed:
Our thinking is those performance enhancements will lay the foundation for your proposed batch upload feature, because they should:
Our current plan is to get all that onto staging by the end of March. So, once that is done, we'll be able to fully test these modifications to Modular Sensors: That will in turn be a foundation for your batch upload PR, which we could either refactor or reimplement to the new batch transmission capabilty. |
Thanks for the update. I guess the PR I created last July will be throw away. I guess its got to be scale able for the server - in my experience always a big issue - I tried framing this back in 2020 WikiWatershed/monitor-my-watershed#485 An observation as identified with a "202 Accepted" its not a "[201 Created]"(https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/201). At an OSI protocol level of serializing readings and transferring them across a medium the "202 Accepted" is a step down from a 201 and not a guaranteed delivery. It will put more onus on the server to be reliable. |
@neilh10, I can recognize how it might be disappointing that it has taken so long for us to be ready to be ready to consider adopting your work from your PR last July. I wouldn't frame it as throwing away your work, however, as we should be able to adapt it, once the server can handle it. I agree that a 202 is a step down from a 201, but we have the collective ability to decide what to do with that. We can deliver a 202 instantly (and the speed of a ping), or we can wait for the post request to be fully processed and deliver a 201. With PR #453 and its promise of a 90x speedup, we might still get a 201 in <1 second, so that could work just fine. I would be interested in your feedback on how to best handle that. |
@aufdenkampe I understand project management requires tradeoffs and you're kind to suggest there will be some use for it - though realistically, there is underlying code that will need reworking - I have looked at the code for a ringbuffer in ram temporary storage for readings, I would expect this PR to be throwaway, but happy if it can be used. IMHO Reliability is poorly understood - and I just don't see the bar being defined that needs to implement reliability as a core feature. WikiWatershed/monitor-my-watershed#485 Reliability is driven by a hard number of how often a reading can be lost and not delivered - OSI network layer CS101 Reliability in code is often making an improvement in one area and characterizing it to ensure the gains are real and understood. I think that is being said here as well WikiWatershed/monitor-my-watershed#688 (comment) Speeding up the server database - and it appears there are some easy low hanging fruit according to - WikiWatershed/monitor-my-watershed#674 This could be tested with a suite of POSTS from a local internet test machine (non Mayfly) - makes it much simpler to characterize and issues not pushed on to the naive end-user. The improvements on the server could then be made available with this thourghly tested "Orderly data delivery" working for the last 3 years, and the gains on the server are real and accessible to everyone (who upgrades) Then the #453 implemented - but actually it can be implemented with out a ring buffer in memory because most of the code base is present in this PR which implements a queue on SD, is reliable through other Mayfly conditions, and is not limited by ram size (ring buffer in memory). I would have no problem create a packed JSON string per specification. I even offered to do it, and do some testing if the server location was identified. The offer wasn't accepted. For power usage, real numbers I'm seeing is that making an LTE connection takes 25+seconds - depending on wireless reliability. Then there is the POSTing to the server, and typically that is 4-20seconds over LTE, so one second would be admirable. For powering on a normal system though there is still the LTE connect time of 25+ seconds - so if the server becomes more reliable AND allows the server to scale and saves a little bit of power. For the end-user (who upgrades) and now it just works. When packed JSON is added, I would think there is a little more improvement, and more importantly the server can scale to handle more users. I have one system that is on the edge of the wireless range - and according to some of the notifications it is probably only getting one in 10 connections through or connections to the server- so all the readings are queued at each "no connection event". When the wind aligns (or maybe its just the server becomes reliable, hard to figure out) then it gets a bunch of POSTS through. Its fantastic to see the node appearing to just work - and with winter it is only doing it when there is power available. Overall a POST with a UUID structure are costly when the wireless signal is on the edge. |
It would be nice if the sensors readings could have an orderly delivery algorithm to the cloud.
The current method, after writing the sensor readings to the SD card, is a single try to deliver the readings to the cloud, and there is no later retry.
I wonder if anybody has any thoughts on it?
Would it be valuable to have the uSD card as a staged database, and for any (wireless) internet connection, the algorithm is attempt to deliver the next undelivered sensor reading, at a staged rate, until all readings are updated.
I would add a Label: feature enhancement - but I don't seem to be able to do that.
The text was updated successfully, but these errors were encountered: