-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provider nodes aren't removed from local DB #40
Comments
from meeting 18/8/22: Possibly add nim-datastore caching for dht data |
Summary: We need to decide the purpose of the persistent datastore (see PR #41) placed behind the SPR cache. Is it solely to populate the cache during process restart? Or is it to maintain a set of available SPRs that's larger than the amount of memory committed to caching them? The behavior of the cache and persistent datastore combo will vary depending on the purpose. We may be able to simplify the caching layer by relying on SQLite's page cache. The page size and size of the page cache can be made configurable in the parameters of In PR #41 I've modified Please see the start of that PR's description re: how persisting SPRs to disk is relevant to this issue. I mentioned in the PR description that some difficulties were encountered when I considered how to combine a cache with SQLite Page Cache [apologies, but anchor links in the following paragraph don't seem to work; refer to the ToC at the top of sqlite.org/fileio.html] SQLite has a Page Cache. The size of that cache and the page size are configurable. Note that sections Page Cache Configuration and Page Cache Algorithms in the linked docs are marked as TODO. Elsewhere on the site it's how explained how to adjust the cache size and how to adjust the page size. The TODO for Page Cache Algorithms hints at "LRU"; more details can be found in comments in the SQLite amalgamation. That file is too big for GitHub to serve except in raw format, but in an editor start by looking at line 49101. From the docs:
So the page cache deals with both reads and writes, but depending on network activity/behavior and configured page and cache sizes, it could provide a performance boost for SPR lookups. Rather than relying on assumptions or intuitions about it, I propose that we experiment with SQLite's page cache in relation to codex-storage/nim-codex#131, codex-storage/nim-codex#213, etc. This issue is focused on caching for the DHT, but my proposal to leverage SQLite's page cache could also apply to nim-codex's use of If the page cache serves our needs well, it could be a simplifying solution, cutting out a layer of complexity in the software we maintain. Difficulties (1) nim-datastore uses Nim's For reads and writes to derivatives of The
EDIT: I had in mind this code in nim-codex when I wrote the paragraph above. Actually, the only information needed to initialize With an approximate value size, the size of a full cache would bounce around: less so if the approximation is accurate, more so if it's not. Alternatively, the cache implementation could be overhauled to be more dynamic re: calculating its current size and in its resize logic, but attempting that didn't seem like the thing to do right now. I decided to forego implementing (2) A more ad hoc solution pairing For one, This issue was opened with a suggestion to use a two-level cache Similar to what I described in (1), we could rely on approximate sizes at both levels to determine BUT, before making a decision about that, there's another difficulty to consider. (3) Since the relationship between CID and SPR is one-to-many, it's not clear how cache eviction is supposed to work in relation to a persistent datastore behind the cache. If the purpose of the persistent datastore is solely to populate the cache during process restart, then the problem is simplified. Adding an SPR would involve a write to the cache and datastore, and a cache eviction would be paired with deletion/s in the datastore. Except for process startup, retrieving SPRs would only involve the cache. The amount of disk space used by the datastore would naturally be correlated with the size of the cache (keeping in mind flux owing to values not having a fixed size), though the datastore would be slightly larger than the cache owing to e.g. overhead of using SQLite and how that's done exactly (see notes in #41). If the purpose of the persistent datastore is to maintain a set of available SPRs that's larger than the amount of memory committed to caching SPRs (the purpose I had in mind when writing the description for #41), things are more complicated1. What's the source of truth when asking the cache + datastore what SPRs it has for a CID? Retrieving from the cache and the datastore every time would defeat the purpose of the cache. Evictions from caches in level-two would result in inconsistency between the cache and datastore, which would defeat or at least hamper the purpose of the datastore. Limiting evictions to level-one of the cache would avoid the inconsistency problem, but with unbounded in-memory stores in level-two we'd be back to where we started. Coming back around to my SQLite Page Cache proposal: If we're able to configure SQLite's page cache and page size, and if the page cache provides an acceptable performance boost for SPR lookups, we can avoid the difficulties described above, i.e. punting to SQLite to know best how to manage a cache for the data it's storing. I'm currently working on a PR for nim-datastore that allows configuring SQLite's page and cache sizes via Footnotes
|
Great discussion! Although caching and persistent storage need to be considered together, perhaps we could create a new issue for the above and handle it separately from this issue? That way we can tackle persistent storage separately from caching optimisation. The way I'm looking at this, if we have an update operation, caching doesn't matter and performance of the update is paramount. Otoh, if we're retrieving unchanged SPRs, cache performance is king. |
Good idea re: creating a separate issue dedicated to the persistent storage aspect. Before creating that, let's have a quick meeting later today and discuss it. There may be several aspects that need to be broken out. I'm thinking in particular of what you said about update operations here and in your comment on #41. We should take a close look at the behavior of the code on the |
@michaelsbradleyjr I don't think we should overcomplicate it at this point and stick with the basic LRU of LRUs as described above. The reason is that adding persistence to the providers cache is problematic because these records are ephemeral and they don't have a lot of value after they have expired, and the expiration time is usually relatively small, in the order of a few seconds to minutes. Persisting this records become problematic because a) we need to constantly cleanup expired records and b) if the node was offline, the persisted records are most likely out of date and would require to be either dropped or run some sort of a refresh procedure where it pings the provider records to see if they are still alive. Both of this are things we could potentially implement, but for now they don't seem to add any value. I would say we stick with the simpler approach and if we see the need, we can alway take the next step and add persistence on top. |
Currently, the local providers are stored in memory in a table that keeps growing infinitely, we should store them in an LRU cache instead and limit it to something reasonable both at the individual entry, in the total number of provider records to hold per entry (hash), as well as the total number of entries.
In other words, we should end up with a two level of LRU cache -
LRUCache[UInt256, LRUCache[UInt256, SignedPeerRecord]]
The text was updated successfully, but these errors were encountered: