chore: add index to improve DB cleanup job performance #3434

David-Wobrock · 2023-08-15T15:36:14Z

The Kratos cleanup job does a query looking like

DELETE FROM selfservice_login_flows
WHERE id in (
  SELECT id FROM (
    SELECT id
    FROM selfservice_login_flows c
    WHERE expires_at <= $1
     AND nid = $2
    ORDER BY expires_at ASC
    LIMIT 200000 )
  AS s )

The query plan of the SELECT-subquery looks like this (on PostgreSQL):

$ EXPLAIN SELECT id FROM (SELECT id FROM selfservice_login_flows c WHERE expires_at <= '2023-08-01' and nid = '750e49c8-a7ce-495e-8f79-ff8848d9e78e' ORDER BY expires_at ASC LIMIT 200000 ) as s;
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on s  (cost=1059.36..1059.38 rows=1 width=16)
   ->  Limit  (cost=1059.36..1059.37 rows=1 width=24)
         ->  Sort  (cost=1059.36..1059.37 rows=1 width=24)
               Sort Key: c.expires_at
               ->  Seq Scan on selfservice_login_flows c  (cost=0.00..1059.35 rows=1 width=24)
                     Filter: ((expires_at <= '2023-08-01 00:00:00'::timestamp without time zone) AND (nid = '750e49c8-a7ce-495e-8f79-ff8848d9e78e'::uuid))
(6 rows)

which does a sequential scan on the table.

This PR adds the index

CREATE INDEX david_test ON selfservice_login_flows (expires_at);

which results in this query plan:

$ EXPLAIN SELECT id FROM (SELECT id FROM selfservice_login_flows c WHERE expires_at <= '2023-08-01' and nid = '750e49c8-a7ce-495e-8f79-ff8848d9e78e' ORDER BY expires_at ASC LIMIT 200000 ) as s;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Subquery Scan on s  (cost=0.28..7.87 rows=1 width=16)
   ->  Limit  (cost=0.28..7.86 rows=1 width=24)
         ->  Index Scan using david_test on selfservice_login_flows c  (cost=0.28..7.86 rows=1 width=24)
               Index Cond: (expires_at <= '2023-08-01 00:00:00'::timestamp without time zone)
               Filter: (nid = '750e49c8-a7ce-495e-8f79-ff8848d9e78e'::uuid)
(5 rows)

Doing an Index Scan instead of the sequential one.

➡️ We created this index manually on our Kratos DB in order to optimize the Kratos cleanup process, which works very well for us.

💡 An alternative can be to create this index containing all columns:

CREATE INDEX david_test_all_cols ON selfservice_login_flows (expires_at, nid, id);

then the query plan looks like:

$ EXPLAIN SELECT id FROM (SELECT id FROM selfservice_login_flows c WHERE expires_at <= '2023-08-01' and nid = '750e49c8-a7ce-495e-8f79-ff8848d9e78e' ORDER BY expires_at ASC LIMIT 200000 ) as s;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on s  (cost=0.28..8.06 rows=1 width=16)
   ->  Limit  (cost=0.28..8.05 rows=1 width=24)
         ->  Index Only Scan using david_test_all_cols on selfservice_login_flows c  (cost=0.28..8.05 rows=1 width=24)
               Index Cond: ((expires_at <= '2023-08-01 00:00:00'::timestamp without time zone) AND (nid = '750e49c8-a7ce-495e-8f79-ff8848d9e78e'::uuid))
(4 rows)

Which does an Index Only Scan, no data loaded from the table itself.
For us, it was not worth it, but maybe that's more in the vein of something you'd like to skip to all Kratos customers :)

Let me know what you think of this PR!

Related issue(s)

Checklist

I have read the contributing guidelines.
I have referenced an issue containing the design document if my change
introduces a new feature.
I am following the
contributing code guidelines.
I have read the security policy.
I confirm that this pull request does not address a security
vulnerability. If this pull request addresses a security vulnerability, I
confirm that I got the approval (please contact
[email protected]) from the maintainers to push
the changes.
I have added tests that prove my fix is effective or that my feature
works.
I have added or changed the documentation.

Further Comments

codecov · 2023-08-15T16:03:44Z

Codecov Report

Merging #3434 (4d46832) into master (f323b3d) will not change coverage.
Report is 3 commits behind head on master.
The diff coverage is 77.98%.

❗ Current head 4d46832 differs from pull request most recent head 5b6a446. Consider uploading reports for the commit 5b6a446 to get more accurate results

@@           Coverage Diff           @@
##           master    #3434   +/-   ##
=======================================
  Coverage   78.02%   78.02%           
=======================================
  Files         327      327           
  Lines       21402    21402           
=======================================
  Hits        16698    16698           
  Misses       3466     3466           
  Partials     1238     1238

Files Changed	Coverage Δ
selfservice/flow/registration/hook.go	`71.52% <ø> (ø)`
identity/manager.go	`76.92% <72.63%> (ø)`
schema/errors.go	`88.77% <81.25%> (ø)`
cmd/clidoc/main.go	`68.34% <100.00%> (ø)`
driver/config/config.go	`82.60% <100.00%> (ø)`
selfservice/flow/registration/error.go	`68.57% <100.00%> (ø)`
selfservice/flow/settings/hook.go	`60.00% <100.00%> (ø)`
selfservice/strategy/oidc/strategy.go	`60.35% <100.00%> (ø)`
text/message_validation.go	`100.00% <100.00%> (ø)`

aeneasr · 2023-08-16T06:21:33Z

Thank you david! This makes a lot of sense. I think it would be good to add this column to one of the existing indices (I think there are some already that cover nid) and have it CREATE INDEX ... OVER nid, expires_at. Otherwise this LGTM :)

David-Wobrock · 2023-08-18T09:04:16Z

Thank you david! This makes a lot of sense. I think it would be good to add this column to one of the existing indices (I think there are some already that cover nid) and have it CREATE INDEX ... OVER nid, expires_at. Otherwise this LGTM :)

Hey @aeneasr 👋

Thanks for the answer.
From a quick description of the table, in our env we have:

kratos=> \d selfservice_login_flows
                                     Table "public.selfservice_login_flows"
           Column            |            Type             | Collation | Nullable |           Default
-----------------------------+-----------------------------+-----------+----------+------------------------------
 id                          | uuid                        |           | not null |
 request_url                 | text                        |           | not null |
 issued_at                   | timestamp without time zone |           | not null | CURRENT_TIMESTAMP
 expires_at                  | timestamp without time zone |           | not null |
 active_method               | character varying(32)       |           | not null |
 csrf_token                  | character varying(255)      |           | not null |
 created_at                  | timestamp without time zone |           | not null |
 updated_at                  | timestamp without time zone |           | not null |
 forced                      | boolean                     |           | not null | false
 type                        | character varying(16)       |           | not null | 'browser'::character varying
 ui                          | jsonb                       |           |          |
 nid                         | uuid                        |           |          |
 requested_aal               | character varying(4)        |           | not null | 'aal1'::character varying
 internal_context            | jsonb                       |           | not null |
 oauth2_login_challenge      | uuid                        |           |          |
 oauth2_login_challenge_data | text                        |           |          |
Indexes:
    "selfservice_login_requests_pkey" PRIMARY KEY, btree (id)
    "idx_created_at" btree (expires_at)
    "selfservice_login_flows_id_nid_idx" btree (id, nid)
    "selfservice_login_flows_nid_id_idx" btree (nid, id)
Foreign-key constraints:
    "selfservice_login_flows_nid_fk_idx" FOREIGN KEY (nid) REFERENCES networks(id) ON UPDATE RESTRICT ON DELETE CASCADE

meaning that the nid is already indexed.
We could still create a new one that covers both (nid, expires_at), does that make sense to you?
I'm not certain of the use cases of nid, it's perhaps more relevant to also index it on the Ory Network/Cloud context.
Let me know :)

chore: add index to improve DB cleanup job performance

5b6a446

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add index to improve DB cleanup job performance #3434

chore: add index to improve DB cleanup job performance #3434

David-Wobrock commented Aug 15, 2023 •

edited

Loading

codecov bot commented Aug 15, 2023

aeneasr commented Aug 16, 2023

David-Wobrock commented Aug 18, 2023

chore: add index to improve DB cleanup job performance #3434

Are you sure you want to change the base?

chore: add index to improve DB cleanup job performance #3434

Conversation

David-Wobrock commented Aug 15, 2023 • edited Loading

Related issue(s)

Checklist

Further Comments

codecov bot commented Aug 15, 2023

Codecov Report

aeneasr commented Aug 16, 2023

David-Wobrock commented Aug 18, 2023

David-Wobrock commented Aug 15, 2023 •

edited

Loading