Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Source Changes, Much Improvements, Wow. #1545

Merged
merged 60 commits into from
Jul 16, 2024

Conversation

ahankinson
Copy link
Member

This PR contains a whole boatload of improvements. In talking with @dchiller about this we thought we would do a monolithic PR to start with, and then see if anything needed to be changed before merging.

At its core this PR changes the way Source records are captured. With the introduction of the Institution model, the siglum and the institution name are now found in that table, and the Source identifier (the shelfmark) is replacing the former title and siglum fields on the Source model.

Along the way, a number of other changes were made.

  • The 'base.html' template gained two new blocks, 'title' and 'script', and all the <title> and <script> tags were moved to their appropriate blocks
  • A new experimental approach to Private Collectors is introduced, by marking any given institution as a private collector. This still needs to be fleshed out, though.
  • The shelfmark field was added to Sources
  • A hacky migration script was added to convert most of the Source records to the new structure. This still needs adjusting for Private collectors
  • The forms and templates for the source edit and create were altered, including a new autocomplete for assigning the holding institution
  • Optimized a bunch of queries and rewrote a couple to remove some code
  • Removed the siglum column from the source chant browse

The biggest change, however, is the rework of the Feast detail page. The previous code was very slow to load the page, but this is now pretty fast through the use of two raw SQL queries to fetch the necessary data for the view. I've compared it with the numbers from the existing setup, and it seems to give the same results, but the page load time has gone from 3+ seconds to ~250ms, and 1,500 queries (in one case) to a pretty constant 8 (in total for the page load). It also fixed a few problems with this page, and displays all the genres associated with a given chant, instead of just one.

How to review this PR

Normally I don't like large PRs, but it seemed appropriate that, since I was touching so many places to rework sources that I could also improve these places as I went.

I've tried to explain the changes thoroughly in every commit message. I'm also posting a few screenshots to compare the old and new changes.

Outstanding issues

Still to-do is to figure out how to deal with Private Collections. You may notice this in some places where a source will have None -- this just means there is a source, but it hasn't been migrated to the Institution / Shelfmark model, so it's not displaying the right data. This should be fixed before this PR is merged.

Also not tackled is any search interface, specifically how it deals with looking up sources by "siglum." Still TBD, particularly if we want to not break any existing searches... Feedback is useful here.

Screenshots

Changes on the left, Current Production on the right.

Feast detail. Note the genre column, and the removal of the chant with no cantus ID.

image

Browse Source chants, with removed siglum column

image

Edit Source, with Holding institution and shelfmark

image

adds blocks for title and scripts so that templates can put those parts in the header.
The title and scripts are no longer simply added to the body of the page, but are now actually contained by the header, where they should be.
Since the siglum and institution name now come from the institution, the source only needs
to have the shelfmark.

However, until all the data is properly migrated, the old title and siglum fields will remain.
This is a hacky, change-as-needed script that serves to migrate the source data from the old structure to the new, creating institutions and moving the shelfmark data to the new sources.

There is a whole bunch of sigla that get skipped (essentially all private collectors) until we figure out how they will be handled.

Don't look to closely, since there are lots of skeletons in this closet.
Allows sources to be added to an institution by using the autocomplete widget
Previously, the LoginView import in the URLs was taken from the views.user module, instead of directly from the django auth views, because this view was also imported into the user module. (aka, a transitive import). This changes it to import from the auth views directly.

I also ran a formatter on the imports in the user module, so that's why it looks like there are more changes than there actually are!
Previously there were several duplicated property definitions that varied only by checking whether the user was logged in. They also did the sorting in Python, rather than in the database.

This condenses all the definitions to a single one per property, and uses the expanding empty dict trick to determine whether `published=True` gets added to the filters. (Since without this, what is really desired is `published = True OR False` but without resorting to Q queries this was the simplest way to do it.)
This commit completely reworks the fetching of data for the feast detail page.

Previously, this page took several seconds to load, and generated a large number of queries, due to the complexity of the information needing to be collated.

This was rewritten to instead use two custom SQL queries that are executed directly. This means that the number of queries generated by this segment of code is now no more than two.

There are a few 'interesting' things in this, which I've tried to note in comments.

In the process, this also fixes a number of other small bugs:

Fixes #1540
Fixes #1136
Fixes #596
This commit reworks the source display in the templates, primarily by calling a couple helper methods on the Source model to keep the referencing consistent. The 'short heading' method displays the Institution siglum and the shelfmark, while the 'heading' displays the full institution name and the shelfmark.

The edit and create templates were reworked for the new structure and source referencing. The source display will now show the institution and the shelfmark in separate columns.
This is to allow saving sources in the create / edit without triggering a validation error.
@ahankinson ahankinson changed the title Big Source Changes, Much Improvements, Wow. WIP: Big Source Changes, Much Improvements, Wow. Jun 18, 2024
@annamorphism
Copy link

oo, shiny!
Comments on the first screenshot (Github is not letting me embiggen the other two right now)

  1. The "chants"/"incipit" columns seem to be flipped in the "Most frequent chants" table
  2. I gather that the source "None None" is something to still be dealt with, but where has the Source Formerly Known as P-Pm 1151 gone? And why did the HR-Zmk and PL-WRu seemingly lose a chant each?

@annamorphism
Copy link

further screenshot commentary:
-removing the siglum from the browse chant table looks great! excellent change.
-Edit source page: I see it is now optional to give a source a siglum+shelfmark. If somebody leaves this blank, what happens? Could they create a whole bunch of sources all named "None None"?
-Relatedly, what happens if the editor can't find their institution in the dropdown list (is this all the institutions in the database? in RISM?) Do they have to contact somebody to add it?

@ahankinson
Copy link
Member Author

The "chants"/"incipit" columns seem to be flipped in the "Most frequent chants" table

Good catch! I've fixed it.

I gather that the source "None None" is something to still be dealt with, but where has the Source Formerly Known as P-Pm 1151 gone? And why did the HR-Zmk and PL-WRu seemingly lose a chant each?

I'm working from a slightly older DB export at the moment, so it may not be completely up-to-date. P-Pm has the title "Porto 1511", which didn't match the "city, institution, shelfmark" pattern, so it got skipped in the conversion. It shows up as None None because of this.

Edit source page: I see it is now optional to give a source a siglum+shelfmark. If somebody leaves this blank, what happens? Could they create a whole bunch of sources all named "None None"?

It's only optional during the transition, since we can't automatically move everything and leave nothing blank. Once we've migrated everything and it "goes live" the option to leave it blank will be removed.

Relatedly, what happens if the editor can't find their institution in the dropdown list (is this all the institutions in the database? in RISM?) Do they have to contact somebody to add it?

That's kinda up to you guys. Do you want to allow people to add their own institutions, or would you rather create them on request. (I suspect the latter.) The only weird bit is Private Collectors, but I'm still working on a proposal for that.

Thanks for the review!

Buckle up, this one's a bit messy.

This commit disables the Feast Detail tests. It turns out, after much searching and Googling and digging, that the Django Test Suite always wraps the tests in a transaction, so that the test can be easily rolled back after each one is run, and thus present a database in a known state for each individual test.

However, with the switch to running a raw SQL query, the cursor in the Feast Detail View now runs outside of this transaction. Thus to the View, the database always looks empty, even if it's running inside of the test.

I tried fixtures, TransactionTestCase, overriding methods and hacking the TestCase to prevent this behaviour, but no dice.

So for now I've marked the Feast Detail tests as "skip" so that they still appear in the runs, but are skipped because they will always fail, as they're written.
It's better to use the named fields for the column content than to rely on the order of the tuple unpacking....
Still failing, but being worked on. Committing here to merge in latest changes.
# Conflicts:
#	django/cantusdb_project/main_app/templates/browse_chants.html
#	django/cantusdb_project/main_app/templates/chant_create.html
#	django/cantusdb_project/main_app/templates/chant_detail.html
#	django/cantusdb_project/main_app/templates/chant_edit.html
#	django/cantusdb_project/main_app/templates/melody_search.html
#	django/cantusdb_project/main_app/templates/source_detail.html
#	django/cantusdb_project/main_app/templates/source_edit.html
#	django/cantusdb_project/main_app/templates/user_source_list.html
Syncing the templates with the latest develop generated a number of merge conflicts. This commit fixes them.

Also it fixes the new block definitions in the base templates.

Fixes #1551
@dchiller
Copy link
Contributor

dchiller commented Jul 15, 2024

The provenance and century detail pages have odd orderings now when changes to templates and views are combined. I'll make a new issue to match what we now do with notation views.

@ahankinson
Copy link
Member Author

The preferred ordering was unclear to me. In many places it was done by siglum, so the Austrian (A-) sources would sort first, then the Belgian (B-) etc.

This is fine, but the format for the full heading doesn't actually contain the siglum, only the full institution name and the shelfmark.

I could imagine something like Cividale del Friuli, Archivio Capitolare (I-CF), LVI where the RISM siglum is in parentheses at the end of the institution name. This is what we do in RISM Online, e.g., https://rism.online/institutions/30000053, "Leipziger Stadtbibliothek - Musikbibliothek, Leipzig (D-LEm)".

Then the sorting of the list would make sense, even though the siglum isn't the first component of the title. If that is what is wanted, then "(I-CF) Cividale del Friuli, Archivio Capitolare" is probably best.

@dchiller
Copy link
Contributor

The preferred ordering was unclear to me.

To me too, which is why I think it makes sense to separate. As it is now, it can just look totally random which probably doesn't make sense.

The second option you list (eg. (I-CF) Cividale del Friuli, Archivio Capitolare) is how it now looks on the notation detail page which I think makes sense.

@annamorphism
Copy link

The preferred ordering was unclear to me.

i believe this was discussed a bit here? #1227

@dchiller
Copy link
Contributor

dchiller commented Jul 15, 2024

@ahankinson One last question, I think:

In the description above you write that there is still work to be done to figure out "private collections" and that this will lead to some sources showing up as None. Locally, I'm not actually running into any trouble with private collections at the source level...I'm running into them at the "Institution" level.

I'm getting institutions that look like this:

image

I'm also getting other weird institutions. For example, here's the US library of Congress:

image

I assume that this latter has to do with the way the names of sources were in the original data. If the intention is that we'll have to do this manual clean-up, all fine, I just want to make sure that's expected.

@dchiller
Copy link
Contributor

i believe this was discussed a bit here? #1227

Yes, now that I look at it, the source list page suffers from this too now (ordering by something that isn't visible). I guess whatever we choose for the source list page should probably carry over to the other pages that list sources too. Title will no longer be an option so we'll have to update what we want in #1227.

@ahankinson
Copy link
Member Author

A “title” doesn’t really make sense when talking about MSS… at least it’s not a universally recognized concept. The identifiers we can sort on are:

  • shelfmark (but we don’t want all the “MS 1” to sort together so that’s not useful
  • Institution name
  • City (but may not be known for private collectors)
  • Siglum

Personally I find sorting by country code, then city, and then institution name to be the most useful, since it groups all the mss from the same institution, and groups all institutions in one city. As long as it’s understood how they’re sorted then that’s probably the best way.

@dchiller
Copy link
Contributor

Thanks @ahankinson for all of this!!

From my pov, we should merge this if you are good and this makes sense #1545 (comment)

@ahankinson
Copy link
Member Author

I just started afresh and re-ran the migration. Here's what I get:

image

Did you run python manage.py migrate_records? The other one, migrate_institution_source_records, got so complicated that I rewrote it but kept the old version for reference.

@dchiller
Copy link
Contributor

I just started afresh and re-ran the migration.

Interesting...I thought I did but let me try again.

@dchiller
Copy link
Contributor

dchiller commented Jul 16, 2024

The same thing happened to me again... @lucasmarchd01 have you tried?

Update 1: I didn't realize that @ahankinson had done significant updates to the data on production to accompany this PR. Local and staging databases need to be updated with a recent production dump for this to work.

Update 2:
There are just a few institutions that probably need to be "fixed" in the data. Should we before merging?
image
image
image

@ahankinson
Copy link
Member Author

I don't see any of those institutions in my data...

@dchiller
Copy link
Contributor

I don't see any of those institutions in my data...

Looks like all are recent (past month) additions. I'm gonna merge, we'll fix them in the data later.

@dchiller dchiller merged commit ecf901c into develop Jul 16, 2024
1 check passed
@ahankinson
Copy link
Member Author

ahankinson commented Jul 16, 2024

I did a fresh export from the cantus database today (the export is currently in /home/fedora on the server) and dropped / re-created the database on my local machine. So I'm not sure how they got there...

cd /home/cantusdb/code/CantusDB/postgres/
docker compose exec -T postgres pg_dump -U cantusdb cantusdb > /home/fedora/cantusdb.sql
cd /home/fedora/
tar -zcvf cantusdb.tgz cantusdb.sql

Anything wrong with that?

@annamorphism
Copy link

Looks like all are recent (past month) additions. I'm gonna merge, we'll fix them in the data later.

I think they are all part of a dump of private collection fragments I'm working on that's still very much in draft form! So that's me.

@ahankinson
Copy link
Member Author

ahankinson commented Jul 16, 2024

But... if they're not in the database dump I literally took from the production server today, where did they come from? Or am I getting the dump from the right place?

Edit: OK, I see them on production. They will need to be fixed post-migration.

@dchiller
Copy link
Contributor

I did a fresh export from the cantus database today (the export is currently in /home/fedora on the server) and dropped / re-created the database on my local machine.

Looks like we have this figured out, but for reference, a new database dump is created every day in /home/cantusdb/backups/ so you don't need to do it yourself if you want the whole database.

@ahankinson
Copy link
Member Author

OK!

dchiller added a commit to dchiller/CantusDB that referenced this pull request Jul 29, 2024
This commit adjusts the columns in the browse sources table
to conform to the changes in source identification introduced in DDMAL#1545.

Refs: DDMAL#1569
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants