Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce efficient slot pagination #169

Closed
wants to merge 1 commit into from

Conversation

gostkin
Copy link
Contributor

@gostkin gostkin commented Dec 27, 2023

Now there's a concept of after: if not everything was returned then the slot of the last event is returned as after in the response.

Events are returned per block to simplify the integrations. So either everything from the same block is returned or nothing.

Comment on lines +77 to +93
export type AfterSlotPagination = {
/**
* Minimal slot from which the events should be returned (not inclusive)
*
* @example 46154769
*/
after: number | undefined,
}
export type UntilSlotPagination = {
/**
* Maximal slot from which the events should be returned (inclusive)
*
* @example 46154860
*/
untilSlot: number | undefined,
}
export type SlotPagination = AfterSlotPagination & UntilSlotPagination;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we don't use slot number in the existing types like TransactionPaginationType and UntilPaginationType is because there is no guarantee a slot number still contains the same data between requests (it's possible a rollback happens that replaces a block that used to be at a slot with a totally different block)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our current use case that shouldn't really change anything though, since we don't put a slot that we don't consider confirmed in the untilSlot. Since we are not handling/expecting rollbacks anyway.

I think if we used tx ids here as cursors, it's still much simpler to keep the slot range, and paginate inside that range only.

Copy link
Contributor

@SebastienGllmt SebastienGllmt Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our current use case that shouldn't really change anything though, since we don't put a slot that we don't consider confirmed in the untilSlot. Since we are not handling/expecting rollbacks anyway.

Sure, but Carp is a general-purpose indexer so we shouldn't ship something that can cause subtle bugs for the average user if they want this task

I think if we used tx ids here as cursors, it's still much simpler to keep the slot range, and paginate inside that range only.

I disagree. This is exactly what we want to move away from because it leads to a slow presync. It means we would have to do a bunch of useless Carp queries for slot ranges that have nothing inside it instead of just querying for the next batch of data and then consuming it as required on the Paima side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be good to have that, and it would certainly be much faster to catch up the entire chain.

But then we would need to keep track of the tx id per cde in a table, since all of them will sync at different points and with different speeds. Although we may need that anyway. And then there is the issue that you never know if you need to fetch an extra page or not.

All of this doesn't really matter for the presync stage in paima, which is where the pagination is more useful, but during the sync stage we will always end up fetching extra data that will need to be kept around (possibly in memory) until the corresponding evm block is fetched. And at that point we may even need to request another page, so while doable, I think it does bring some extra complexity and I'm not sure it has much benefit.

So I'm thinking, could we have both things instead? We can use tx based pagination to catch up during presync, but just using slot ranges during the sync stage?

Copy link
Contributor

@SebastienGllmt SebastienGllmt Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just querying by slot range will also break if a single slot contains more entries than the pagination limit (which is not so unrealistic because Cardano allows transactions with like 100 outputs which often happens during NFT drops) so you would have to at the very least make sure the page limit is larger potentially a few txs like this in a block (plus a margin to take into account the fact block sizes may increase in the future)

Avoiding this issue is why we also keep track of the tx id for the pagination for the other endpoints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a single slot contains more entries than the limit they all are returned, since this amount of data has upper limit in any case and if we return more data it doesn't break anything and it is not infinite amount of data

Comment on lines +40 to +43
genErrorMessage(Errors.SlotRangeLimitExceeded, {
limit: PROJECTED_NFT_LIMIT.MAX_LIMIT,
found: limit,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message needs to be updated as well

@gostkin
Copy link
Contributor Author

gostkin commented Jan 29, 2024

closing since we decided not to go this way

@gostkin gostkin closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants