Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI metadata mapping for journal articles (for harvesting by Primo) #547

Open
dolsysmith opened this issue Apr 23, 2024 · 3 comments
Open
Milestone

Comments

@dolsysmith
Copy link
Contributor

Several GW-published or edited journals are archived as collections in GWSS. To enhance discovery of these collections, we can harvest the metadata for indexing in Primo.

To accomplish this, I think we'd need to do the following:

  • Create a new OAI mapping (similar to the oai_dc_etd mapping for ETD's) to include Dublin Core fields at the article level indicating the journal title and issue information.
  • It may facilitate the creation of these fields to create Journal Issue works as the immediate parent of all individual article works.
  • Create a new OAI set that contains just works nested under Journal Issue works. (Dependent on the previous.)
  • Create an XML mapping in Primo to translate this DC metadata to a Primo record with the appropriate fields.
  • Create a new Primo import profile to target this set of works.

Decision points:

  1. How do we want to handle cases where the entire issue is treated as a single work? Do we want records for journal issues in Primo?
@kilahimm kilahimm added this to the 2.1.1 milestone Jul 30, 2024
@dolsysmith
Copy link
Contributor Author

dolsysmith commented Aug 1, 2024

Notes from Schol Comm's discussion with Matthew (RDG).

General approach

  • Ingest GWSS works into Primo for individual articles published under GW auspices
  • For journals where we have GWSS works only at the volume/issue level, RDG can create MARC records in Alma and link to the GWSS collection.

Some preliminary thoughts on implementation:

  • The best place to capture the title of the journal and the volume/issue for articles is in a GWSS work of the GwJournalIssue type.
  • Individual articles would be assigned as child works of the journal-issue parent.
  • To create the OAI set, we can take one of at least three approaches:
    1. Find all works where has_model_ssim is GwJournalIssue, and for each journal-issue result, retrieve the works listed in the member_ids_ssim field.
    2. Create a faceted field that indicates whether a work is GW-published (or something analogous). Faceting on gw_published:true will yield the relevant works.
    3. Create a new administrative set just for GW-published articles and populate it with these works, and have the ListSets behavior generate sets based on the admin set. (I think this would be possible.)
  • For reasons outlined below, I believe that the 2nd or 3rd option may be technically the less complicated to implement than the 1st.
  • To create the Dublin Core metadata for each work, we can get the journal title and volume/issue metadata from the parent_works attribute of the Fedora GwWork object. (Note that the Solr result for a work does not contain a link to the parents, only the children.)

Other issues

  • To expedite assigning works already in the repository to journal issues, it would be expedient to use Bulkrax. Workflow:
    1. Create Bulkrax Exporter for all the works in a given journal collection.
    2. Create the Journal Issues in the UI for each volume/issue.
    3. Assign the appropriate issue to the parents column of the exported CSV.
    4. Re-import the CSV to update the metadata.

@dolsysmith
Copy link
Contributor Author

dolsysmith commented Aug 2, 2024

Technical notes

  • Need to create an OAI set that contains just records for journal articles for harvesting in Primo.
    • One option: customize behavior of the GwJournalIssue set, so that it displays child works and not the parent work.
    • Alternatively, the ListSets behavior of the Blacklight OAI provider gem can be customized. But this behavior leverages faceted fields to produce sets, so without overhauling the logic entirely, we'd need a facet that corresponds to that set of journal articles, which we currently lack.
  • To test:
    • Observe to_sets behavior on an instance of a SolrDocument.
    • I'm guessing the find method in the SolrDocumentWrapper class is the pivotal method. (See also here). It may be necessary to override this behavior, or one of its dependencies.
    • The set query assumes the form of a simple Solr filter on a facet field.
    • Because the find method ultimately leverages the search_builder parameter chain, we'd probably have to deconstruct this to create the more complex search required, i.e., to target records of type GwWork that are members of GwJournalIssue records.
    • Can we create more than one type of set by providing multiple facet fields? Yes, maybe?

@dolsysmith
Copy link
Contributor Author

Update: after conversation with Schol Comm, recommendation is to use a separate admin set to identify OA-published articles for harvesting by Primo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants