-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LCSH detector #113
Add LCSH detector #113
Conversation
0c030f7
to
66ef637
Compare
66ef637
to
6592996
Compare
6592996
to
8693617
Compare
8693617
to
79fd1cb
Compare
79fd1cb
to
ed58d24
Compare
ed58d24
to
6241c1a
Compare
6241c1a
to
404ac26
Compare
404ac26
to
3934139
Compare
** Why are these changes being introduced: We have noticed a significant volume of search traffic that looks like multi-level LCSH headings, like "Geology -- Massachusetts". These likely come from the Bento UI, which makes subject values like this clickable. It makes sense to try and write a detector for this pattern, especially as it would be the first detector which would resolve to an Informational (or subject-based) search. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-71 ** How does this address that need: This writes a new Detector::Lcsh class, which uses a regex to look for a ' -- ' separator. The class is patterned off of the StandardIdentifier class. I initially wrote this as part of that class, but this doesn't really belong there - the pattern isn't an identifier in that sense, and further work to identify subjects (particularly single-level subjects like "Geology" rather than just "Geology -- Massachusetts") will go beyond just using a regex for detections. Adding this class has follow-on changes to the Term, Metrics, and GraphQL areas of the application. Outside the app code, there are a variety of tests, changes to the db seeds, and a new migration to record the item counts that come from the metrics work. There will be a future ticket to look up the detected string in the set of subject headings, to return more than just a string in the GraphQL. Right now the GraphQL response is pretty useless, just sending the search string back. It would be good to include include something else. ** Document any side effects to this change: The Detectors Type file has its methods alphabetized. I'm not sure if Detectors::Lcsh should instead be Detectors::LCSH?
3934139
to
77626cb
Compare
# there are regex-able vocabulary quirks which might separate subject values from non-subject values. | ||
def subject_patterns | ||
{ | ||
separator: /(.*)\s--\s(.*)/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider a pattern that did not include the spaces in addition to this one? When I look at other LCSH sources, I see things like Zimbabwe--Economic policy
which would not be detected by our pattern even though it is an LCSH term copied from their site.
If you considered this but were wary to make such a general detector, that is fine but I did think it would be worth asking :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider this, but didn't go that route because I think the instances we're seeing in our traffic all have the separator. I wrote the test explicitly to make sure the other option fails as a way of being clear about what we're aiming for.
That said, I'd also be willing to look back at this in the future to see whether we're missing any relevant traffic.
'orange cats like popcorn', | ||
'hyphenated names like Lin-Manuel Miranda do nothing', | ||
'dashes used as an aside - like this one - do nothing', | ||
'This one should--also not work' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test I'm talking about, FWIW, which explicitly sets out that this pattern is not one we're expecting.
Summary of changes (please refer to commit messages for full details)
Developer
Ticket(s)
https://mitlibraries.atlassian.net/browse/TCO-71
Accessibility
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
Documentation
ENV
Stakeholders
Dependencies and migrations
NO dependencies are updated
YES migrations are included
Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing