-
-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine action tables #4459
base: main
Are you sure you want to change the base?
Combine action tables #4459
Conversation
Before you go forward and spend too much time on this, it needs a lot of discussion, because we could lose a lot of data integrity solely for the sake of post_view query speed. An update to a There are a lot of inside-postgres things we could do before getting rid of the |
@dessalines Would that problem be fixed by using a composite type for each action that stores multiple values? Edit: or multi-column constraints, like |
I'm not sure I like that option either, at least for source data. The only thing I can think of rn, that would also help with the linked issue below, is to do what you're doing with the We desperately need some SQL experts that could help us with this one, as well as #2444 which is a similar problem. |
I dont think this implementation would create any problems with data integrity, as you have mandatory columns for person_id, post_id etc and then optional columns for each action. In effect its the same integrity we have with existing table definitions. There is a risk to read or write the wrong column, but that seems unlikely as we can keep using existing wrapper methods such as On the other hand storing the data in another table and using triggers will definitely give us consistency bugs, as happened with comment counts. So I would say go ahead with this approach. |
I've posted this to ![email protected] to see if any SQL experts can chime in on a correct way of doing this. |
I changed the implementation of the existing post functions to use the post_actions table. The only remotely scary thing is automatically deleting rows after all actions are unset. I will do that with a trigger that runs |
Or otherwise we make a branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, I think this is probably a good idea, though i'm not 100% confident.
Wrt clean database design, this could be considered a bad idea, since it's kind of denormalization - instead of not having rows when the values aren't present, there's now a lot more null columns. But it's not very unclean and joins are both hard to write and read, and wrt performance, it's probably good.
Wrt the code, there's a lot of changes and without spending a lot of time it's difficult to tell whether everything is transferred perfectly. Like that uplete stuff, no idea whether that's right or not.a
The non-null assuming overrides in diesel seem like a bit of a hack
to me that might cause problems in the future, maybe it's possible to solve that more elegantly (like removing all the separate structs that now reference the same tables, but that would be an even bigger refactoring).
cc @dullbananas some merge conflicts |
You havent answered my question above regarding Edit: I get it now, we have sql tables like |
|
||
Ok(()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this looks quite complicated, would be good to have some unit tests.
Maybe in the future I could create something that causes |
conflicts are resolved now |
We still gotta get more ppl than me looking at this. Its been on our PR list for too long, and it'll give a lot of potential performance benefits. |
My comments are not adressed yet. |
…ead_comments_amount
Did you actually compare the query plans eg for PostView before and after these changes to verify that there is a major benefit? These changes are very complex and can cause strange bugs from AssumeNotNull, as well as making future code changes much more difficult. So if there is only a minor benefit I would rather skip it and keep the current implementation. It may not be the most efficient, but at least its easy to understand and maintain. If we merge this then you definitely need to add tests for uplete.rs. In case there is a weird failure in api tests it would be very hard to track it down to a specific part of that file otherwise. |
Now there's tests in the uplete module. I don't remember checking the query plans and durations. I will do that soon. Or you could do it if you have enough time in the next few days, which should be super easy with scripts/db_perf.sh. If you do, remember to merge from main right before checking. I don't completely agree about the maintainability tradeoff. I think the current action-related code is completely the opposite of "easy to understand and maintain". There's already much simpler joins now with the combined tables, and maybe overall more ease in adding more actions. In the future there can be less maintainability problems by not using separate structs, or separate fields in views, for each individual action type. Let me know if you want me to reduce the assume_not_null risk before this PR is merged, at the expense of this PR taking a much longer time. |
This should cause a huge improvement in query plans, especially for queries that previously reached the from/join collapse limits. For example, getting saved posts might now start with an index scan of the post_actions table, which avoids scanning posts that the user didn't do anything with (or all non-saved posts if I add partial indexes, but I don't know if I should do that).
This will also make the code much cleaner and reduce the size of the database. (Edit: it may or may not reduce size)
Indexes for the new action tables will use
INCLUDE
WHERE
withIS NULL
for each action column to keep index-only scans possible.In the new joins, person_id will not use a bind parameter if it's
None
, so there can still be separate generic query plans for users that are not logged in.