Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geyser: stream all accounts state outside of startup stage #2958

Open
fanatid opened this issue Sep 20, 2024 · 3 comments
Open

geyser: stream all accounts state outside of startup stage #2958

fanatid opened this issue Sep 20, 2024 · 3 comments

Comments

@fanatid
Copy link

fanatid commented Sep 20, 2024

Problem

Right now if somebody wants to build an indexer based on geyser plugin they need to create a custom plugin and can't use gRPC plugin because geyser stream all accounts state only on the startup stage and it's not possible to get it later. With streaming all accounts it would be possible to create read layers outside of instance where node work.

Proposed Solution

agave:

outside:

  • stream all accounts on startup to some storage and update it with new accounts in the loop (full data duplication)

cc @lijunwangs

@lijunwangs
Copy link

This can be achieved by replaying the states from a recent full snapshot and followed by incremental snapshots until the difference is small enough to cut over to the normal geyser plugin. This could have serious performance impact and can be opted in by explicit validator configuration.

@ceciEstErmat
Copy link

This would be a really useful useful feature at my company and i'm more than willing to implement it.

Taking a look at the code i see that :
The notify process in reconstruct_accountsdb_from_fields starts once the restore is complete.
https://github.com/anza-xyz/agave/blob/master/runtime/src/serde_snapshot.rs#L1213
But, the notify itself does not really take that into account, it just gets all the slots from the storage, get the entry for each and notify.
https://github.com/anza-xyz/agave/blob/master/accounts-db/src/accounts_db/geyser_plugin_utils.rs#L41

Thus, I'm wondering why we can't do the same here ? Maybe going over "all slots" is not necessary in our case and just starting from the "latest slot" to the last full snapshot slot is enough ?

A Geyser request would then be added to the rpc_handler for the user to notify an willingness to start the stream process.

@diman-io
Copy link

diman-io commented Oct 9, 2024

You could use ledger-tool for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants