Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neume 2.1 ? #7

Open
il3ven opened this issue Mar 9, 2023 · 7 comments
Open

neume 2.1 ? #7

il3ven opened this issue Mar 9, 2023 · 7 comments

Comments

@il3ven
Copy link
Collaborator

il3ven commented Mar 9, 2023

Our current roadmap for neume is to support decent, lens and make the crawler more generic. The below are few technical changes which I propose for this roadmap.

Save Tracks instead of NFTs

Our schema currently represents an NFT. However, multiple NFTs can represent the song (track). This leads to duplication of data. The consumer of neume has to merge NFTs into tracks.

We stuck with NFTs because it was simpler and levelDB isn't suitable for tracks.

Pros of moving to Tracks

  • It will make the crawler more generic because not every protocol will publish audio as NFTs. For eg. lens.
  • We will save space since multiple NFTs can point to the same track.

Problem with saving tracks in levelDB

LevelDB is a key-value database. Imagine we have the following track in our database. owners is the list of owners for this track.

{
  ...
  "owners": [],
  ...
}

If two threads simultaneously update the owners field they will have to overwrite everything.

// Thread 1
const oldTrack = getTrack(id)
const newTrack = oldTrack.owners.push('0x123')
updateTrack(newTrack)

// Thread 2
const oldTrack = getTrack(id)
const newTrack = oldTrack.owners.push('0xabc')
updateTrack(newTrack)

Let's suppose thread 2 finishes last. We have the following value in our database.

{
  ...
  "owners": ["0xabc"],
  ...
}

Databases like MongoDB allow to insert values into a nested field but unfortunately levelDB doesn't. We can write code and add this functionality in levelDB but it won't be flexible. If we have another field like owner in the future we will have to write more code. Not ideal.

Using sqlite to solve the above LevelDB problem

I propose to give sqlite a try. To save effort we can use ORMs such as sequalize.

We dismissed sqlite before because it was pointed out that it has slow write speed. I argue that speed isn't our top priority and how slow can sqlite be.

Make strategies more generic

To be written...

@neatonk
Copy link
Member

neatonk commented Mar 9, 2023

It sounds like the fundamental issue you're describing is a race condition between threads. Both threads update the same value and the last write wins, which is not what you want in this case. Instead you would like the result to include both values.

You have proposed sqlite as a solution. Would the idea be to model the owners array as a many-to-one relationship in which each track has many owners? If so, then this is solving the issue by changing the data model.

I'd like to share some alternatives, but have run out of time. Looking forward to more discussion on this later.

@il3ven
Copy link
Collaborator Author

il3ven commented Mar 9, 2023

It sounds like the fundamental issue you're describing is a race condition between threads. Both threads update the same value and the last write wins, which is not what you want in this case. Instead you would like the result to include both values.

Exactly.

You have proposed sqlite as a solution. Would the idea be to model the owners array as a many-to-one relationship in which each track has many owners? If so, then this is solving the issue by changing the data model.

Yes, I do plan to implement a many-to-one relationship.

I'd like to share some alternatives, but have run out of time. Looking forward to more discussion on this later.

Alternatives are most welcome.

@neatonk
Copy link
Member

neatonk commented Mar 10, 2023

Alternatives are most welcome.

Nice. Thanks!

The most apparent alternative would be to stick with levelDB, but change the data model to avoid the race condition. In this case that would mean creating a new key for tracking owners. Something like some-key-referring-to-an-nft/owners/0xabc, where the first part can be the key you are currently using and the last part is the owner address. The value at that key could be blank or include details like chain id and block number, which are likely represented in the key already. There may be trade-offs affecting usability on read that would need to be considered in more detail.

Also, is the key structure documented somewhere? Just now realizing I am not entirely up to speed on that.

@neatonk
Copy link
Member

neatonk commented Mar 10, 2023

Another option would be to consider the use of a CRDT with the desired semantics. I am not familiar enough with the use of CRDTs to suggest how it would apply in this case. https://crdt.tech/implementations

@il3ven
Copy link
Collaborator Author

il3ven commented Mar 11, 2023

Also, is the key structure documented somewhere? Just now realizing I am not entirely up to speed on that.

It isn't documented but you can find it here.

datumToKey(datum: Partial<Datum>) {

Something like some-key-referring-to-an-nft/owners/0xabc, where the first part can be the key you are currently using and the last part is the owner address. The value at that key could be blank or include details like chain id and block number, which are likely represented in the key already.

Yes, I have also thought about this and it is valid solution. However, we will have to write code to merge the owners on read. We can do it for now but if the schema changes in the future we will have to do a rewrite. Also, if we introduce new many-to-many or one-to-many relationships then we will write more custom code.

I will have a look at CRDT too but if sqlite doesn't impact our performance then we should use it instead of implementing everything ourselves. I believe the network calls will be the bottleneck while crawling and not our DB.

@reimertz
Copy link
Member

reimertz commented Mar 14, 2023

To add some spice to this conversation - I have been thinking about the possibility of going down the route of piggy-backing on the progress of Strapi and have Neume being a fork of their project (that we keep up to date by merging new releases / bug fixes).

The reason why this idea is intriguing for me is that we'd get a lot of functionality for free

  • Database adapters (SQLite, Postgres, MongoDB, MySQL, MariaDB)
  • CRM
  • Schema creation / validation
  • Import / Export / Syncing
  • GraphQL / REST APIs
  • Cron Job support (current crawl command could prob be re-written to a cron-task)
  • Deployment
  • Hosted Deployment
  • CLI (could rename / add neume crawl / neume daemon etc)

But there are some concerns / blockers with this approach;

  • Strapi is not ACID-complaint (relates to above pointed out issues)
  • Updates are very costly - for us, that means the owners and transactions. But I do think if we went down the route fo utilizing their one-to-many / many-to-many relationships, we could limit writes and therefore get better write performance.

Would love your input / ideas regarding this @il3ven @neatonk

(This is probably more of a neume 3.0 discussion, but intrigued to hear what you think )

@neatonk
Copy link
Member

neatonk commented Mar 14, 2023

To add some spice to this conversation - I have been thinking about the possibility of going down the route of piggy-backing on the progress of Strapi and have Neume being a fork of their project (that we keep up to date by merging new releases / bug fixes).

Interesting idea and spicy, as advertised.

I'd argue against this for neume mostly because I think it would be detrimental to other use cases of neume that wouldn't need any of that. That said, I think it would be reasonable to structure neume as a library that can be embedded into other apps with minimal friction. Could be a good thought exercise to ask what would need to change about neume for that to be feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants