Support paginated querying of storage records #3001

ff137 · 2024-05-31T13:49:52Z

In a multi-tenant environment with potentially millions of wallets, pagination becomes a critical feature.

Currently, the list wallets endpoint is configured to fetch all. It takes a good few minutes to execute when there are 100s of 1'000s of tenants.

There are probably a lot of endpoints that can benefit from pagination, but off the top of my head, these ones stand out:

list wallets
- ✨ Adds support for paginated storage queries, and implements pagination for the wallets_list endpoint #3000
list connection records
list credential exchange records
list presentation exchange records
- ✨ Add pagination support for listing Connection, Cred Ex, and Pres Ex records #3033

Additional features:

fix pagination with post-filter query params
- 🐛 fix storage record pagination with post-filter query params #3082
support for ordering in pagination
- ✨ Add ordering options to askar scan and fetch_all methods #3173

I've started working on adding pagination support here: #3000, and so far I've managed to get it working for listing wallets 🎉.

For our use case we need support for the above listed bullet points, so I'll try get those all working for an initial implementation.

Any comments, recommendations or further discussion is welcome!

Related issues: #1919, #2373

ff137 · 2024-06-13T20:38:31Z

@jamshale it's still in progress (I can't reopen it) -- still need to make a PR for connection / cred ex / presentation records. And adding support for ordering in askar

ff137 · 2024-06-19T13:21:33Z

Potential endpoints that may need pagination as well:

listing created rev regs, in the revocation API
listing transaction records, in the endorse_transaction API
listing mediation requests, in coordinate_mediation

Those are the only other ones I can see, with potentially large responses

ff137 · 2024-07-03T13:42:17Z

I've discovered a bug with pagination + additional post-filter query parameters.

e.g. for fetching connection records, specifying limit=1 and alias="some alias", then it will return 0 records, despite that alias appearing in some records (i.e. it'll return relevant records when limit is set higher). Post-filtering is being done after the limited fetch, which messes up expected output

ff137 · 2024-07-03T14:12:24Z

This impacts:

fetching v1 or v2 credential or presentation exchange records, with query params: connection_id, role, state
fetching connection records with: alias, state, their_role, connection_protocol

These are all the post-filter query params for the routes that now have paginated query support. So, limit + offset will not work as expected when using these query params

A workaround is to ignore limit/offset when these query params exist, and to only apply limit/offset after the post filtering ... this would resolve the expected behavior, but there will still be a performance/scalability cost when there are potentially 1000s of records.

@jamshale Do you think this is a reasonable workaround to implement? Behavior <= 0.12.1 was anyway fetching all records. So, while it's not ideal to apply limit/offset after post-filtering, it's still better than having no pagination support ...

jamshale · 2024-07-03T15:41:03Z

@ff137 Yes, I think that's the correct way forward for now. I've never understood why these post filters have to be applied the way they do but I think that's good enough as is and we can try and address it in the future.

ff137 · 2024-07-03T15:51:18Z

Agreed. It's of course highly inefficient -- loading each record result as json and post filtering on fields (edit: I see the json deserialization is necessary either way, nvm). I'll do the workaround for now, but will also take a look at askar library for how this may be improved

ff137 · 2024-07-25T10:24:42Z

I'm working on the following issue in Askar to add support for custom ordering: openwallet-foundation/askar#290

@swcurran @jamshale I think this would be an essential feature for the 1.0.0 release, so that paginating through storage records guarantees that missing or duplicate records aren't returned across pages.

jamshale · 2024-07-25T15:04:41Z

Thanks for taking this on. This is an important feature.

My only concern about having it in the 1.0.0 release is that it keeps getting delayed. We really want to get it out and have a more steady release cadence.

This pagination still works correctly, correct? It's just extremely inefficient with the post filtering?

ff137 · 2024-07-25T19:12:28Z

That's all good. Writing tests for the new feature might take some time too. Is the idea to release 1.0.0 ASAP? If there's a rough date set, then I can try get this ready by then. If not, I'll just see how far I get!

Also yes, the pagination works -- I've just run into trouble when trying to assert that each page returns distinct results, with no duplicates or missing records. I managed to create 100'000 wallets, and fetch 1000 at a time, asserting each page has distinct results. So that was successful. But then when fetching connection records with post-filter query params, I couldn't get the same guarantees. Creating new connections, or new wallets, might cause some shuffling. Difficult to pin down exactly what the cause or frequency of the problem is. But all I can say is there's currently no "guarantee" that scanning pages, by increasing the offset, will always return distinct results. Having an ordering option will provide guarantee / more control for the end user.

jamshale · 2024-07-25T20:07:20Z

We don't have a date. We want to release ASAP.

I think this problem is acceptable to have in the release. It would make a good release milestone once it's completed.

This was referenced May 31, 2024

Pagination support openwallet-foundation/askar#232

Closed

Paginate long responses didx-xyz/aries-cloudapi-python#621

Closed

jamshale closed this as completed Jun 13, 2024

jamshale reopened this Jun 13, 2024

jamshale added the 1.0.0 To be addressed for the ACA-Py 1.0.0 release label Jul 3, 2024

ff137 mentioned this issue Jul 3, 2024

🐛 fix storage record pagination with post-filter query params #3082

Merged

jamshale removed the 1.0.0 To be addressed for the ACA-Py 1.0.0 release label Jul 8, 2024

ff137 mentioned this issue Jul 25, 2024

✨ Support ordering in paginated queries didx-xyz/aries-cloudapi-python#908

Closed

This was referenced Aug 14, 2024

Pagination on created schemas and cred defs? #2373

Closed

Storage Find_all/Query with limit and offset #1919

Closed

ff137 linked a pull request Aug 16, 2024 that will close this issue

✨ Add ordering options to askar scan and fetch_all methods #3173

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support paginated querying of storage records #3001

Support paginated querying of storage records #3001

ff137 commented May 31, 2024 •

edited

Loading

ff137 commented Jun 13, 2024 •

edited

Loading

ff137 commented Jun 19, 2024

ff137 commented Jul 3, 2024 •

edited

Loading

ff137 commented Jul 3, 2024 •

edited

Loading

jamshale commented Jul 3, 2024

ff137 commented Jul 3, 2024 •

edited

Loading

ff137 commented Jul 25, 2024

jamshale commented Jul 25, 2024

ff137 commented Jul 25, 2024

jamshale commented Jul 25, 2024

Support paginated querying of storage records #3001

Support paginated querying of storage records #3001

Comments

ff137 commented May 31, 2024 • edited Loading

ff137 commented Jun 13, 2024 • edited Loading

ff137 commented Jun 19, 2024

ff137 commented Jul 3, 2024 • edited Loading

ff137 commented Jul 3, 2024 • edited Loading

jamshale commented Jul 3, 2024

ff137 commented Jul 3, 2024 • edited Loading

ff137 commented Jul 25, 2024

jamshale commented Jul 25, 2024

ff137 commented Jul 25, 2024

jamshale commented Jul 25, 2024

ff137 commented May 31, 2024 •

edited

Loading

ff137 commented Jun 13, 2024 •

edited

Loading

ff137 commented Jul 3, 2024 •

edited

Loading

ff137 commented Jul 3, 2024 •

edited

Loading

ff137 commented Jul 3, 2024 •

edited

Loading