-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with List API reliability #156
Comments
thanks for reporting! This is great. It might be that the /list endpoint has also gotten less reliable over time because we've done all sorts of caching and optimization to the Since other users have also asked for better APIs to work with Kiwi News, I was wondering: Should we make the /list API better, or may we need much better APIs in general? @storema3 it seems that for you the list endpoint serves its purpose. Also: Do you actually need the server to send you the signatures and the identities of who signed the messages? I imagine yes, right? Because that is among the computationally effortful tasks. In general, here's why the /list endpoint is bad
|
Thanks for the prompt response. There would be two areas of improvement. For everybody spinning up a new node, or re-synchronizing after a failure:
As for the extraction of node data: The push model of sending simplified data into a separate DB would be a nice alternative, if that DB could be remote and could be updated continually and reliably in a shortish interval (5-15 mins). This approach would make sense, if the Opening up the internal data structures might make sense for clients like Erigon, dealing with well-known data structures that change relatively slow (the chains). And even they offer Otterscan as an interface (and of course the RPCs). For a protocol like KN, constantly looking for new opportunities, it might be a hindrance. The main requirements for ETL from a reconciliation node would be:
About the identity in a message: We export the identity per message, but not the signature etc. The identity is used for statistics (activity per user ...) and maybe also as a future search criteria. We export only the identity because we trust that we could verify the message sender via the node, if necessary. |
The API POST /api/v1/list is the main method to get message data out of a Kiwi News node. Having worked with it over a longer time span, it shows more and more issues that make it harder to work with it.
The API allows retrieving the messages pagewise, by specifying the start index and the number of messages. To retrieve all messages of a node, the method must be called repeatedly.
Known limitations of the API
The API walks the trie and is expensive in terms of memory and computation. With smaller nodes (4GB RAM, see the Hetzner CAX11 for a reference) it is necessary to pause (15–30 seconds) between each invocation of the API; otherwise the node might hang or crash.
The API seems to highly dependent on the state/load of the system. Sometimes clients get responses without any data, although the next invocation returns data. On other occasions, one client gets repeatedly HTTP 500 responses about missing leaf nodes, while another client happily retrieves the same data the first one also called for.
The sequence of data items retrieved is not always guaranteed. Therefore, ffter node crashes or re-initialization of a node, a complete re-download of all messages is often necessary.
While the re-download was no problem when there were only a few thousand messages, with increasing messages numbers this is becoming a problem for systems relying on the data.
ETL process to retrieve message data and missing data
The ETL process to retrieve data consists of two steps:
list
API, until no data is returned anymore.from
(start index) parameter.Step (1) normally works, occasional server errors can be corrected by repeating the API calls. On a CAX11 VM this process can take an hour.
However, it occurred that the number of messages retrieved after a complete re-initialization of a node is lower than its previous message count. A system, that previously had exported 25112 messages from a node, got only 25048 after the node data had been deleted and re-synched.
Step (2) is problematic. It gets all new messages from the node, but over time it also gets duplicate messages that should not be there:
The image shows the amount of new messages the ETL process exports (green line), per call. The yellow line is the number of duplicates the process received. Here, at 10:00, the ETL process received a message with an already existing message index, although it should have gotten only new items! Is there now somewhere new data, that changed the sequence?
These duplicates never go away, their number increases over time (1-4 messages per occasion). After 37 days of operation, there were 10 duplicates.
This behavior affects the download of
amplify
andcomment
messages.The text was updated successfully, but these errors were encountered: