Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a default sort order for some tables for functions like first_for_patient #2208

Open
rw251 opened this issue Nov 6, 2024 · 2 comments
Assignees
Labels
ergonomics Making ehrQL easier to use and read without adding substantial new features

Comments

@rw251
Copy link
Contributor

rw251 commented Nov 6, 2024

As part of a review of user ehrql code one of the patterns often used was sorting the clinical_events or medications table and then taking a first or last_for_patient. This would be simplified if the sort order was baked in. Also it would probably avoid confusion. Methods like first_for_patient could possibly be interpreted by analysts as the first event chronologically, rather than whatever the table was sorted on.

@rw251
Copy link
Contributor Author

rw251 commented Nov 8, 2024

Thoughts on what field should be sorted automatically when first/last_for_patient are called. I've looked at the first 40 files that match either last_for_patient or first_for_patient:

Table Comments Propose to default sort
clinical_events Only ever sorted by date - >30 instances Yes
medications Only ever sorted by date - but only 3 instances Yes
vaccinations Only ever sorted by date - 14 instances Yes
ons_deaths Only sorted by date - but only 2 instances. Probably too confusing to include because people would presume a single death record per person. Hmm in fact this table is a PatientFrame, so not sure why there are some people sorting it and taking last_for_patient No
apcs Only ever sorted by admission_date - but only 4 instances. Could feasibly be sorted by discharge_date as well. No
sgss_covid_all_tests Only ever sorted by specimen_taken_date - 6 instances Maybe
addresses Just start_date - 2 occurrences (but probably everytime imd is calculated). Also as start_date, end_date, has_postcode, -address_id No
open_prompt Sorted by creation_date or numeric_value or consultation_date or consultation_id No
practice_registrations Just start_date 6 times, just end_date 4 times and both 4 times No

It's extremely common for clinical_events to also search for clinical codes and a time window. So we could also offer first_for_patient_after, last_for_patient_before etc

Probably want it so that the auto sort_by doesn't happen if you actually specify a sort order.

Need to check the docs carefully e.g. this section talks about these functions only working on sorted frames: https://docs.opensafely.org/ehrql/how-to/examples/#what-is-the-firstlast-event-matching-some-criteria

@rw251 rw251 self-assigned this Nov 8, 2024
@rw251
Copy link
Contributor Author

rw251 commented Nov 13, 2024

Following discussion with @evansd:

  • Instead of helpers like first_for_patient_after, last_for_patient_before etc, we should encourage people to use the pattern where the date filter is done once, and then reused, rather than date filtering each time:
clinical_events_before_index_date = clinical_events.where(clinical_events.date.is_before("2023-01-01"))

last_covid_vaccination = clinical_events_before_index_date.where(<check codelist>).last_for_patient
last_blood_pressure = clinical_events_before_index_date.where(<check codelist>).last_for_patient
...
  • The pre-sorting is probably best done at a low level by defining a pre-sorted table class, rather than simply adding a *st_for_patient method on the relevant tables and relying on the fact that this is overloaded by the sortable event frame

@evansd evansd added the ergonomics Making ehrQL easier to use and read without adding substantial new features label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ergonomics Making ehrQL easier to use and read without adding substantial new features
Projects
None yet
Development

No branches or pull requests

2 participants