-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce efficient slot pagination #169
Conversation
export type AfterSlotPagination = { | ||
/** | ||
* Minimal slot from which the events should be returned (not inclusive) | ||
* | ||
* @example 46154769 | ||
*/ | ||
after: number | undefined, | ||
} | ||
export type UntilSlotPagination = { | ||
/** | ||
* Maximal slot from which the events should be returned (inclusive) | ||
* | ||
* @example 46154860 | ||
*/ | ||
untilSlot: number | undefined, | ||
} | ||
export type SlotPagination = AfterSlotPagination & UntilSlotPagination; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we don't use slot number in the existing types like TransactionPaginationType
and UntilPaginationType
is because there is no guarantee a slot number still contains the same data between requests (it's possible a rollback happens that replaces a block that used to be at a slot with a totally different block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our current use case that shouldn't really change anything though, since we don't put a slot that we don't consider confirmed in the untilSlot
. Since we are not handling/expecting rollbacks anyway.
I think if we used tx ids here as cursors, it's still much simpler to keep the slot range, and paginate inside that range only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our current use case that shouldn't really change anything though, since we don't put a slot that we don't consider confirmed in the untilSlot. Since we are not handling/expecting rollbacks anyway.
Sure, but Carp is a general-purpose indexer so we shouldn't ship something that can cause subtle bugs for the average user if they want this task
I think if we used tx ids here as cursors, it's still much simpler to keep the slot range, and paginate inside that range only.
I disagree. This is exactly what we want to move away from because it leads to a slow presync. It means we would have to do a bunch of useless Carp queries for slot ranges that have nothing inside it instead of just querying for the next batch of data and then consuming it as required on the Paima side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be good to have that, and it would certainly be much faster to catch up the entire chain.
But then we would need to keep track of the tx id per cde in a table, since all of them will sync at different points and with different speeds. Although we may need that anyway. And then there is the issue that you never know if you need to fetch an extra page or not.
All of this doesn't really matter for the presync stage in paima, which is where the pagination is more useful, but during the sync stage we will always end up fetching extra data that will need to be kept around (possibly in memory) until the corresponding evm block is fetched. And at that point we may even need to request another page, so while doable, I think it does bring some extra complexity and I'm not sure it has much benefit.
So I'm thinking, could we have both things instead? We can use tx based pagination to catch up during presync, but just using slot ranges during the sync stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just querying by slot range will also break if a single slot contains more entries than the pagination limit (which is not so unrealistic because Cardano allows transactions with like 100 outputs which often happens during NFT drops) so you would have to at the very least make sure the page limit is larger potentially a few txs like this in a block (plus a margin to take into account the fact block sizes may increase in the future)
Avoiding this issue is why we also keep track of the tx id for the pagination for the other endpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a single slot contains more entries than the limit they all are returned, since this amount of data has upper limit in any case and if we return more data it doesn't break anything and it is not infinite amount of data
genErrorMessage(Errors.SlotRangeLimitExceeded, { | ||
limit: PROJECTED_NFT_LIMIT.MAX_LIMIT, | ||
found: limit, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message needs to be updated as well
closing since we decided not to go this way |
Now there's a concept of
after
: if not everything was returned then the slot of the last event is returned asafter
in the response.Events are returned per block to simplify the integrations. So either everything from the same block is returned or nothing.