-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #49 from MITLibraries/tco17-historical-snapshots-a…
…ggregations Tco17 historical snapshots aggregations
- Loading branch information
Showing
11 changed files
with
306 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# frozen_string_literal: true | ||
|
||
# == Schema Information | ||
# | ||
# Table name: metrics_algorithms | ||
# | ||
# id :integer not null, primary key | ||
# month :date | ||
# doi :integer | ||
# issn :integer | ||
# isbn :integer | ||
# pmid :integer | ||
# unmatched :integer | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
module Metrics | ||
# Algorithms aggregates statistics for matches for all SearchEvents | ||
class Algorithms < ApplicationRecord | ||
self.table_name = 'metrics_algorithms' | ||
|
||
# generate metrics data about SearchEvents matches | ||
# | ||
# @note This is expected to only be run once per month per type of aggregation (once with no month supplied, once | ||
# with a month supplied), ideally at the beginning of the following month to ensure as | ||
# accurate as possible statistics. Running further from the month in question will work, but matches will use the | ||
# current versions of all algorithms which may not match the algorithm in place during the month the SearchEvent | ||
# occurred. | ||
# @note We don't currently prevent this running more than once per month per type of aggregation. | ||
# @param month [DateTime] A DateTime object within the `month` to be generated. Defaults to nil will runs is how | ||
# total algorithm statistics are created. | ||
# @example | ||
# # Generate metrics for all SearchEvents | ||
# Metrics::Algorithms.new.generate | ||
# | ||
# # Generate metrics for all SearchEvents last month | ||
# Metrics::Algorithms.new.generate(1.month.ago) | ||
# @return [Metrics::Algorithms] The created Metrics::Algorithms object. | ||
def generate(month = nil) | ||
matches = if month.present? | ||
count_matches(SearchEvent.single_month(month).includes(:term)) | ||
else | ||
count_matches(SearchEvent.all.includes(:term)) | ||
end | ||
Metrics::Algorithms.create(month:, doi: matches[:doi], issn: matches[:issn], isbn: matches[:isbn], | ||
pmid: matches[:pmid], unmatched: matches[:unmatched]) | ||
end | ||
|
||
# Counts matches supplied events | ||
# | ||
# @note We currently only have StandardIdentifiers to match. As we add new algorithms, this method will need to | ||
# expand to handle additional match types. | ||
# @param events [Array of SearchEvents] An array of SearchEvents to check for matches. | ||
# @return [Hash] A Hash with keys for each known algorithm and the count of matched SearchEvents. | ||
def count_matches(events) | ||
matches = Hash.new(0) | ||
known_ids = %i[unmatched pmid isbn issn doi] | ||
|
||
events.each do |event| | ||
ids = StandardIdentifiers.new(event.term.phrase) | ||
|
||
matches[:unmatched] += 1 if ids.identifiers.blank? | ||
|
||
known_ids.each do |id| | ||
matches[id] += 1 if ids.identifiers[id].present? | ||
end | ||
end | ||
|
||
matches | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
class CreateMetricsAlgorithms < ActiveRecord::Migration[7.1] | ||
def change | ||
create_table :metrics_algorithms do |t| | ||
t.date :month | ||
t.integer :doi | ||
t.integer :issn | ||
t.integer :isbn | ||
t.integer :pmid | ||
t.integer :unmatched | ||
t.timestamps | ||
end | ||
end | ||
end |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# frozen_string_literal: true | ||
|
||
# == Schema Information | ||
# | ||
# Table name: metrics_algorithms | ||
# | ||
# id :integer not null, primary key | ||
# month :date | ||
# doi :integer | ||
# issn :integer | ||
# isbn :integer | ||
# pmid :integer | ||
# unmatched :integer | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
require 'test_helper' | ||
|
||
class Algorithms < ActiveSupport::TestCase | ||
# Monthlies | ||
test 'dois counts are included in monthly aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | ||
assert aggregate.doi == 1 | ||
end | ||
|
||
test 'issns counts are included in monthly aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | ||
assert aggregate.issn == 1 | ||
end | ||
|
||
test 'isbns counts are included in monthly aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | ||
assert aggregate.isbn == 1 | ||
end | ||
|
||
test 'pmids counts are included in monthly aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | ||
assert aggregate.pmid == 1 | ||
end | ||
|
||
test 'unmatched counts are included are included in monthly aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | ||
assert aggregate.unmatched == 2 | ||
end | ||
|
||
test 'creating lots of searchevents leads to correct data for monthly' do | ||
# drop all searchevents to make math easier and minimize fragility over time as more fixtures are created | ||
SearchEvent.delete_all | ||
|
||
doi_expected_count = rand(1...100) | ||
doi_expected_count.times do | ||
SearchEvent.create(term: terms(:doi), source: 'test') | ||
end | ||
|
||
issn_expected_count = rand(1...100) | ||
issn_expected_count.times do | ||
SearchEvent.create(term: terms(:issn_1075_8623), source: 'test') | ||
end | ||
|
||
isbn_expected_count = rand(1...100) | ||
isbn_expected_count.times do | ||
SearchEvent.create(term: terms(:isbn_9781319145446), source: 'test') | ||
end | ||
|
||
pmid_expected_count = rand(1...100) | ||
pmid_expected_count.times do | ||
SearchEvent.create(term: terms(:pmid_38908367), source: 'test') | ||
end | ||
|
||
unmatched_expected_count = rand(1...100) | ||
unmatched_expected_count.times do | ||
SearchEvent.create(term: terms(:hi), source: 'test') | ||
end | ||
|
||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | ||
|
||
assert doi_expected_count == aggregate.doi | ||
assert issn_expected_count == aggregate.issn | ||
assert isbn_expected_count == aggregate.isbn | ||
assert pmid_expected_count == aggregate.pmid | ||
assert unmatched_expected_count == aggregate.unmatched | ||
end | ||
|
||
# Total | ||
test 'dois counts are included in total aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate | ||
assert aggregate.doi == 1 | ||
end | ||
|
||
test 'issns counts are included in total aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate | ||
assert aggregate.issn == 1 | ||
end | ||
|
||
test 'isbns counts are included in total aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate | ||
assert aggregate.isbn == 1 | ||
end | ||
|
||
test 'pmids counts are included in total aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate | ||
assert aggregate.pmid == 2 | ||
end | ||
|
||
test 'unmatched counts are included are included in total aggregation' do | ||
aggregate = Metrics::Algorithms.new.generate | ||
assert aggregate.unmatched == 2 | ||
end | ||
|
||
test 'creating lots of searchevents leads to correct data for total' do | ||
# drop all searchevents to make math easier and minimize fragility over time as more fixtures are created | ||
SearchEvent.delete_all | ||
|
||
doi_expected_count = rand(1...100) | ||
doi_expected_count.times do | ||
SearchEvent.create(term: terms(:doi), source: 'test') | ||
end | ||
|
||
issn_expected_count = rand(1...100) | ||
issn_expected_count.times do | ||
SearchEvent.create(term: terms(:issn_1075_8623), source: 'test') | ||
end | ||
|
||
isbn_expected_count = rand(1...100) | ||
isbn_expected_count.times do | ||
SearchEvent.create(term: terms(:isbn_9781319145446), source: 'test') | ||
end | ||
|
||
pmid_expected_count = rand(1...100) | ||
pmid_expected_count.times do | ||
SearchEvent.create(term: terms(:pmid_38908367), source: 'test') | ||
end | ||
|
||
unmatched_expected_count = rand(1...100) | ||
unmatched_expected_count.times do | ||
SearchEvent.create(term: terms(:hi), source: 'test') | ||
end | ||
|
||
aggregate = Metrics::Algorithms.new.generate | ||
|
||
assert doi_expected_count == aggregate.doi | ||
assert issn_expected_count == aggregate.issn | ||
assert isbn_expected_count == aggregate.isbn | ||
assert pmid_expected_count == aggregate.pmid | ||
assert unmatched_expected_count == aggregate.unmatched | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters