-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
** Why are these changes being introduced: We have noticed a significant volume of search traffic that looks like multi-level LCSH headings, like "Geology -- Massachusetts". These likely come from the Bento UI, which makes subject values like this clickable. It makes sense to try and write a detector for this pattern, especially as it would be the first detector which would resolve to an Informational (or subject-based) search. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-71 ** How does this address that need: This writes a new Detector::Lcsh class, which uses a regex to look for a ' -- ' separator. The class is patterned off of the StandardIdentifier class. I initially wrote this as part of that class, but this doesn't really belong there - the pattern isn't an identifier in that sense, and further work to identify subjects (particularly single-level subjects like "Geology" rather than just "Geology -- Massachusetts") will go beyond just using a regex for detections. Adding this class has follow-on changes to the Term, Metrics, and GraphQL areas of the application. Outside the app code, there are a variety of tests, changes to the db seeds, and a new migration to record the item counts that come from the metrics work. There will be a future ticket to look up the detected string in the set of subject headings, to return more than just a string in the GraphQL. Right now the GraphQL response is pretty useless, just sending the search string back. It would be good to include include something else. ** Document any side effects to this change: The Detectors Type file has its methods alphabetized. I'm not sure if Detectors::Lcsh should instead be Detectors::LCSH?
- Loading branch information
1 parent
862a979
commit 3934139
Showing
15 changed files
with
230 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# frozen_string_literal: true | ||
|
||
class Detector | ||
# Detector::LCSH is a very rudimentary detector for the separator between levels of a Library of Congress Subject | ||
# Heading (LCSH). These subject headings follow this pattern: "Social security beneficiaries -- United States" | ||
class Lcsh | ||
attr_reader :identifiers | ||
|
||
# For now the initialize method just needs to run the pattern checker. A space for future development would be to | ||
# write additional methods to look up the detected LCSH for more information, and to confirm that the phrase is | ||
# actually an LCSH. | ||
def initialize(term) | ||
@identifiers = {} | ||
term_pattern_checker(term) | ||
end | ||
|
||
# The record method will consult the set of regex-based detectors that are defined in Detector::Lcsh. Any matches | ||
# will be registered as Detection records. | ||
# | ||
# @note While there is currently only one check within the Detector::Lcsh class, the method is build to anticipate | ||
# additional checks in the future. Every such check would be capable of generating a separate Detection record | ||
# (although a single check finding multiple matches would still only result in one Detection). | ||
# | ||
# @return nil | ||
def self.record(term) | ||
results = Detector::Lcsh.new(term.phrase) | ||
|
||
results.identifiers.each_key do | ||
Detection.find_or_create_by( | ||
term:, | ||
detector: Detector.where(name: 'LCSH').first | ||
) | ||
end | ||
|
||
nil | ||
end | ||
|
||
private | ||
|
||
def term_pattern_checker(term) | ||
subject_patterns.each_pair do |type, pattern| | ||
@identifiers[type.to_sym] = match(pattern, term) if match(pattern, term).present? | ||
end | ||
end | ||
|
||
# This implementation will only detect the first match of a pattern in a long string. For the separator pattern this | ||
# is fine, as we only need to find one (and finding multiples wouldn't change the outcome). If a pattern does come | ||
# along where match counts matter, this should be reconsidered. | ||
def match(pattern, term) | ||
pattern.match(term).to_s.strip | ||
end | ||
|
||
# subject_patterns are regex patterns that can be applied to indicate whether a search string is looking for an LCSH | ||
# string. At the moment there is only one - for the separator character " -- " - but others might be possible if | ||
# there are regex-able vocabulary quirks which might separate subject values from non-subject values. | ||
def subject_patterns | ||
{ | ||
separator: /(.*)\s--\s(.*)/ | ||
} | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
class AddLcshToMetricsAlgorithm < ActiveRecord::Migration[7.1] | ||
def change | ||
add_column :metrics_algorithms, :lcsh, :integer | ||
end | ||
end |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,9 @@ isbn: | |
issn: | ||
name: 'ISSN' | ||
|
||
lcsh: | ||
name: 'LCSH' | ||
|
||
pmid: | ||
name: 'PMID' | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'test_helper' | ||
|
||
class Detector | ||
class LcshTest < ActiveSupport::TestCase | ||
test 'lcsh detector activates when a separator is found' do | ||
true_samples = [ | ||
'Geology -- Massachusetts', | ||
'Space vehicles -- Materials -- Congresses' | ||
] | ||
|
||
true_samples.each do |term| | ||
actual = Detector::Lcsh.new(term).identifiers | ||
|
||
assert_includes(actual, :separator) | ||
end | ||
end | ||
|
||
test 'lcsh detector does nothing in most cases' do | ||
false_samples = [ | ||
'orange cats like popcorn', | ||
'hyphenated names like Lin-Manuel Miranda do nothing', | ||
'dashes used as an aside - like this one - do nothing', | ||
'This one should--also not work' | ||
] | ||
|
||
false_samples.each do |term| | ||
actual = Detector::Lcsh.new(term).identifiers | ||
|
||
assert_not_includes(actual, :separator) | ||
end | ||
end | ||
|
||
test 'record method does relevant work' do | ||
detection_count = Detection.count | ||
t = terms('lcsh') | ||
|
||
Detector::Lcsh.record(t) | ||
|
||
assert_equal(detection_count + 1, Detection.count) | ||
end | ||
|
||
test 'record does nothing when not needed' do | ||
detection_count = Detection.count | ||
t = terms('isbn_9781319145446') | ||
|
||
Detector::Lcsh.record(t) | ||
|
||
assert_equal(detection_count, Detection.count) | ||
end | ||
end | ||
end |
Oops, something went wrong.