Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose an embedded database that fits the "load" strategy use case of neume #246

Open
TimDaub opened this issue Aug 30, 2022 · 7 comments
Open
Assignees

Comments

@TimDaub
Copy link
Collaborator

TimDaub commented Aug 30, 2022

  • fast writes
  • must it allow distributing data (single machine vs many machines)?
  • needs to be FOSS compatible and cannot be proprietary
  • needs to have well documented and well-maintained nodejs package
  • cannot be an extra process and must be embeddable (e.g. like better-sqlite)
  • must have as little possible feature overhead and complexity as possible
  • must allow us to create two-dimensional indexes, e.g. "The json data for block number 123 is at offset 456 in the database file". It seems a very basic key-value based storage is fine.
  • For ACID, we care most about
    • Atomicity: A transaction must either be completely written to disk or fail completely
    • Correctness: A transaction must never corrupt the database file
  • We don't care strictly for Isolation and it some cases it can be fine if transactions are racing each other. Most of our crawler's results are additive and hence cumulative. For transactions that aren't topologically dependent, just have them being written to disk in whatever order and that's fine
  • Durability: We don't care much what happens to influx transactions during a crash of neume. As long as we have an none-corrupt database that we can use to recover from the crash.

useful website

long-list

excluded:

@TimDaub TimDaub pinned this issue Aug 30, 2022
@il3ven
Copy link
Contributor

il3ven commented Aug 30, 2022

@TimDaub In issue #207 you mentioned that we are not interested in storing structured data. Does opening of this issue mean we are ready to convert our JSON into SQL tables?

@TimDaub
Copy link
Collaborator Author

TimDaub commented Aug 30, 2022

For now, most important is reducing the complexity of random access via indexes and complying neatly the the above outlined criteria. But since we're gonna build an API eventually, we might need to use a database that would allow us to join tables. But e.g. for now, I personally don't see that need.

Unless, with music-os-accumulator, we're doing just that... joins...

@TimDaub TimDaub self-assigned this Aug 31, 2022
@TimDaub
Copy link
Collaborator Author

TimDaub commented Sep 8, 2022

Here's another use case for the load component.

@TimDaub
Copy link
Collaborator Author

TimDaub commented Sep 16, 2022

note to myself: It'd be awesome if every strategy could define their identifier within neume itself and then other identifiers could link to those buckets and identifiers with uris, similar to JSON-LD does it.

@il3ven
Copy link
Contributor

il3ven commented Sep 22, 2022

I like https://www.sqlite.org/json1.html. Instead of having fixed tables we can store json in columns and also query it if needed. We can even create indexes on the json data for faster retrievals. Plus, sqlite is also battle tested.

note to myself: It'd be awesome if every strategy could define their identifier within neume itself and then other identifiers could link to those buckets and identifiers with uris, similar to JSON-LD does it.

In sqlite we should be able to do this with foreign keys.

@TimDaub
Copy link
Collaborator Author

TimDaub commented Sep 28, 2022

I'd be all in for using the single thing that e.g. makes sqlite solve our usecases but considering that we may want to distribute the crawl results later via a network like IPFS as in this specification (neume-network/neuIPs#2), I think it'd be premature to use sqlite now. How about if for now we add a load component and allow the strategy implementer to define a "identity" function for each line in the transformation flat file?

@TimDaub
Copy link
Collaborator Author

TimDaub commented Sep 28, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants