Combine action tables #4459

dullbananas · 2024-02-16T21:32:15Z

This should cause a huge improvement in query plans, especially for queries that previously reached the from/join collapse limits. For example, getting saved posts might now start with an index scan of the post_actions table, which avoids scanning posts that the user didn't do anything with (or all non-saved posts if I add partial indexes, but I don't know if I should do that).

This will also make the code much cleaner and reduce the size of the database. (Edit: it may or may not reduce size)

Indexes for the new action tables will use ~~INCLUDE~~ WHERE with IS NULL for each action column to keep index-only scans possible.

In the new joins, person_id will not use a bind parameter if it's None, so there can still be separate generic query plans for users that are not logged in.

dessalines · 2024-02-16T23:11:48Z

Before you go forward and spend too much time on this, it needs a lot of discussion, because we could lose a lot of data integrity solely for the sake of post_view query speed. An update to a person_action table, when that action could be many different columns is a lot more confusing than single-action tables with solid constraints.

There are a lot of inside-postgres things we could do before getting rid of the post_like or comment_like table (unfortunately most of them would be some form of caching / non-source data store tho).

dullbananas · 2024-02-16T23:42:41Z

@dessalines Would that problem be fixed by using a composite type for each action that stores multiple values?

Edit: or multi-column constraints, like (a IS NULL) = (b IS NULL)

dessalines · 2024-02-17T17:00:21Z

I'm not sure I like that option either, at least for source data.

The only thing I can think of rn, that would also help with the linked issue below, is to do what you're doing with the post_action_table (with many optional columns), but have it act as a cache / secondary store, being filled by triggers on inserts / updates to source tables like post_like. I don't like this too much, since these secondary stores are nearly always imperfect and tend to get out of sync, and solving problems with them can be a nightmare.

We desperately need some SQL experts that could help us with this one, as well as #2444 which is a similar problem.

Nutomic · 2024-02-19T10:14:23Z

I dont think this implementation would create any problems with data integrity, as you have mandatory columns for person_id, post_id etc and then optional columns for each action. In effect its the same integrity we have with existing table definitions. There is a risk to read or write the wrong column, but that seems unlikely as we can keep using existing wrapper methods such as PostLike::like.

On the other hand storing the data in another table and using triggers will definitely give us consistency bugs, as happened with comment counts. So I would say go ahead with this approach.

dessalines · 2024-02-19T23:20:52Z

I've posted this to ![email protected] to see if any SQL experts can chime in on a correct way of doing this.

https://programming.dev/post/10280707

dullbananas · 2024-02-24T20:46:08Z

I changed the implementation of the existing post functions to use the post_actions table.

The only remotely scary thing is automatically deleting rows after all actions are unset. I will do that with a trigger that runs DELETE. It shouldn't have concurrency problems because the condition after WHERE is re-checked if needed after locking the row. Also, forgetting to update the trigger after adding columns will be guaranteed to raise an error because tuple comparison with the whole row will be used (e.g. (foo.*) = (foo.a, foo.b, NULL, NULL)).

Nutomic · 2024-07-09T15:54:34Z

Or otherwise we make a branch release/0.19, cherry-pick commits for 0.19.6 and then merge this to main. This way we can also start merging all the other breaking PRs.

phiresky

Conceptually, I think this is probably a good idea, though i'm not 100% confident.

Wrt clean database design, this could be considered a bad idea, since it's kind of denormalization - instead of not having rows when the values aren't present, there's now a lot more null columns. But it's not very unclean and joins are both hard to write and read, and wrt performance, it's probably good.

Wrt the code, there's a lot of changes and without spending a lot of time it's difficult to tell whether everything is transferred perfectly. Like that uplete stuff, no idea whether that's right or not.a
The non-null assuming overrides in diesel seem like a bit of a hack to me that might cause problems in the future, maybe it's possible to solve that more elegantly (like removing all the separate structs that now reference the same tables, but that would be an even bigger refactoring).

dessalines · 2024-09-16T15:21:09Z

cc @dullbananas some merge conflicts

Nutomic · 2024-09-17T10:03:43Z

You havent answered my question above regarding assume_not_null. Will Lemmy crash if that assumption is wrong? Why cant you mark it as not null directly in sql?

Edit: I get it now, we have sql tables like comment_action.score which are null if the user hasnt cast any vote. But in the api there cant be an option, so we need to convert exclude null values in the query. Makes sense, but I hope this extra complexity wont cause problems in the future.

Nutomic · 2024-09-17T10:07:18Z

crates/db_schema/src/utils/uplete.rs

+
+    Ok(())
+  }
+}


All this looks quite complicated, would be good to have some unit tests.

migrations/2024-05-31-134311_smoosh-tables-together/up.sql

dullbananas · 2024-09-20T00:43:01Z

Maybe in the future I could create something that causes filter and assume_not_null to be encapsulated in a way that prevents accidentally making unexpected null errors possible. It would probably be a variant of the Selectable derive macro that creates the whole query. Until then, the as_select calls just need to be used with the right filter when implementing the query for a new action type.

dullbananas · 2024-10-15T21:53:53Z

conflicts are resolved now

dessalines · 2024-10-22T20:45:46Z

We still gotta get more ppl than me looking at this. Its been on our PR list for too long, and it'll give a lot of potential performance benefits.

Nutomic · 2024-10-23T13:18:01Z

My comments are not adressed yet.

…ead_comments_amount

Nutomic · 2024-10-31T10:15:44Z

Did you actually compare the query plans eg for PostView before and after these changes to verify that there is a major benefit? These changes are very complex and can cause strange bugs from AssumeNotNull, as well as making future code changes much more difficult. So if there is only a minor benefit I would rather skip it and keep the current implementation. It may not be the most efficient, but at least its easy to understand and maintain.

If we merge this then you definitely need to add tests for uplete.rs. In case there is a weird failure in api tests it would be very hard to track it down to a specific part of that file otherwise.

…count_methods

dullbananas · 2024-11-06T05:27:18Z

Now there's tests in the uplete module.

I don't remember checking the query plans and durations. I will do that soon. Or you could do it if you have enough time in the next few days, which should be super easy with scripts/db_perf.sh. If you do, remember to merge from main right before checking.

I don't completely agree about the maintainability tradeoff. I think the current action-related code is completely the opposite of "easy to understand and maintain". There's already much simpler joins now with the combined tables, and maybe overall more ease in adding more actions. In the future there can be less maintainability problems by not using separate structs, or separate fields in views, for each individual action type.

Let me know if you want me to reduce the assume_not_null risk before this PR is merged, at the expense of this PR taking a much longer time.

dullbananas added 11 commits January 20, 2024 15:08

Update schema.rs

bf93207

Update post_view.rs

96a24eb

Update comment_report_view.rs

a597eb6

Update comment_report_view.rs

b7f11f1

Update post_view.rs

0f9ffa8

Update utils.rs

d4ffdf5

Update schema.rs

aa7077f

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

67394bf

stuff

9761ea8

stuff

893d62a

fix actions

66bdc49

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

224b1e4

dullbananas added 8 commits February 20, 2024 02:24

PostLike

8cb325b

fmt

e12e2c2

more post stuff (partial)

d3f57ba

remove uplete

9a18388

returning

3f56324

rename read_comments field

af44c6c

PersonPostAggregates

152a53f

a

61c75ea

dullbananas added 4 commits February 24, 2024 20:54

fix usage of read_comments_amount

cedce7d

comment

fb143b5

community

042860c

community_block

c93994b

phiresky reviewed Jul 9, 2024

View reviewed changes

dullbananas added 3 commits July 19, 2024 21:41

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

a811d82

Merge branch 'main' into smoosh-tables-together

3351090

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

ed3267e

Nutomic reviewed Sep 17, 2024

View reviewed changes

crates/db_schema/src/utils/uplete.rs

Ok(())

}

}

Copy link

Member

Nutomic Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this looks quite complicated, would be good to have some unit tests.

Nutomic reviewed Sep 17, 2024

View reviewed changes

migrations/2024-05-31-134311_smoosh-tables-together/up.sql Outdated Show resolved Hide resolved

dullbananas added 3 commits October 14, 2024 22:08

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

a190cce

finish merge

9f262f5

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

b13f988

dessalines requested review from phiresky and Nutomic October 22, 2024 20:45

dullbananas and others added 5 commits October 27, 2024 15:06

Fix index that checked read_comments twice instead of also checking r…

c8aceb2

…ead_comments_amount

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

b7a84cf

Merge branch 'main' into smoosh-tables-together

3822d77

Merge remote-tracking branch 'upstream/main' into smoosh-tables-together

51cdbb8

fix

53d135e

dullbananas added 5 commits November 5, 2024 18:27

uplete: test_count, test_generated_sql_setting_one_column_null, test_…

2bac1b3

…count_methods

refactor uplete sql test

92b1c28

test setting both columns to null in uplete

4e4444e

make AllNull generic

5ffd4b1

test AllNull

83b4b90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine action tables #4459

Combine action tables #4459

dullbananas commented Feb 16, 2024 •

edited

Loading

dessalines commented Feb 16, 2024

dullbananas commented Feb 16, 2024 •

edited

Loading

dessalines commented Feb 17, 2024

Nutomic commented Feb 19, 2024

dessalines commented Feb 19, 2024

dullbananas commented Feb 24, 2024

Nutomic commented Jul 9, 2024

phiresky left a comment •

edited

Loading

dessalines commented Sep 16, 2024

Nutomic commented Sep 17, 2024 •

edited

Loading

Nutomic Sep 17, 2024

dullbananas commented Sep 20, 2024 •

edited

Loading

dullbananas commented Oct 15, 2024

dessalines commented Oct 22, 2024

Nutomic commented Oct 23, 2024

Nutomic commented Oct 31, 2024 •

edited

Loading

dullbananas commented Nov 6, 2024

Combine action tables #4459

Are you sure you want to change the base?

Combine action tables #4459

Conversation

dullbananas commented Feb 16, 2024 • edited Loading

dessalines commented Feb 16, 2024

dullbananas commented Feb 16, 2024 • edited Loading

dessalines commented Feb 17, 2024

Nutomic commented Feb 19, 2024

dessalines commented Feb 19, 2024

dullbananas commented Feb 24, 2024

Nutomic commented Jul 9, 2024

phiresky left a comment • edited Loading

Choose a reason for hiding this comment

dessalines commented Sep 16, 2024

Nutomic commented Sep 17, 2024 • edited Loading

Nutomic Sep 17, 2024

Choose a reason for hiding this comment

dullbananas commented Sep 20, 2024 • edited Loading

dullbananas commented Oct 15, 2024

dessalines commented Oct 22, 2024

Nutomic commented Oct 23, 2024

Nutomic commented Oct 31, 2024 • edited Loading

dullbananas commented Nov 6, 2024

dullbananas commented Feb 16, 2024 •

edited

Loading

dullbananas commented Feb 16, 2024 •

edited

Loading

phiresky left a comment •

edited

Loading

Nutomic commented Sep 17, 2024 •

edited

Loading

dullbananas commented Sep 20, 2024 •

edited

Loading

Nutomic commented Oct 31, 2024 •

edited

Loading