OAI metadata mapping for journal articles (for harvesting by Primo) #547

dolsysmith · 2024-04-23T12:58:18Z

Several GW-published or edited journals are archived as collections in GWSS. To enhance discovery of these collections, we can harvest the metadata for indexing in Primo.

To accomplish this, I think we'd need to do the following:

Create a new OAI mapping (similar to the oai_dc_etd mapping for ETD's) to include Dublin Core fields at the article level indicating the journal title and issue information.
It may facilitate the creation of these fields to create Journal Issue works as the immediate parent of all individual article works.
Create a new OAI set that contains just works nested under Journal Issue works. (Dependent on the previous.)
Create an XML mapping in Primo to translate this DC metadata to a Primo record with the appropriate fields.
Create a new Primo import profile to target this set of works.

Decision points:

How do we want to handle cases where the entire issue is treated as a single work? Do we want records for journal issues in Primo?

The text was updated successfully, but these errors were encountered:

dolsysmith · 2024-08-01T15:18:53Z

Notes from Schol Comm's discussion with Matthew (RDG).

General approach

Ingest GWSS works into Primo for individual articles published under GW auspices
For journals where we have GWSS works only at the volume/issue level, RDG can create MARC records in Alma and link to the GWSS collection.

Some preliminary thoughts on implementation:

The best place to capture the title of the journal and the volume/issue for articles is in a GWSS work of the GwJournalIssue type.
Individual articles would be assigned as child works of the journal-issue parent.
To create the OAI set, we can take one of at least three approaches:
1. Find all works where has_model_ssim is GwJournalIssue, and for each journal-issue result, retrieve the works listed in the member_ids_ssim field.
2. Create a faceted field that indicates whether a work is GW-published (or something analogous). Faceting on gw_published:true will yield the relevant works.
3. Create a new administrative set just for GW-published articles and populate it with these works, and have the ListSets behavior generate sets based on the admin set. (I think this would be possible.)
For reasons outlined below, I believe that the 2nd or 3rd option may be technically the less complicated to implement than the 1st.
To create the Dublin Core metadata for each work, we can get the journal title and volume/issue metadata from the parent_works attribute of the Fedora GwWork object. (Note that the Solr result for a work does not contain a link to the parents, only the children.)

Other issues

To expedite assigning works already in the repository to journal issues, it would be expedient to use Bulkrax. Workflow:
1. Create Bulkrax Exporter for all the works in a given journal collection.
2. Create the Journal Issues in the UI for each volume/issue.
3. Assign the appropriate issue to the parents column of the exported CSV.
4. Re-import the CSV to update the metadata.

dolsysmith · 2024-08-02T13:36:37Z

Technical notes

Need to create an OAI set that contains just records for journal articles for harvesting in Primo.
- One option: customize behavior of the GwJournalIssue set, so that it displays child works and not the parent work.
- Alternatively, the ListSets behavior of the Blacklight OAI provider gem can be customized. But this behavior leverages faceted fields to produce sets, so without overhauling the logic entirely, we'd need a facet that corresponds to that set of journal articles, which we currently lack.
To test:
- Observe to_sets behavior on an instance of a SolrDocument.
- I'm guessing the find method in the SolrDocumentWrapper class is the pivotal method. (See also here). It may be necessary to override this behavior, or one of its dependencies.
- The set query assumes the form of a simple Solr filter on a facet field.
- Because the find method ultimately leverages the search_builder parameter chain, we'd probably have to deconstruct this to create the more complex search required, i.e., to target records of type GwWork that are members of GwJournalIssue records.
- Can we create more than one type of set by providing multiple facet fields? Yes, maybe?

dolsysmith · 2024-08-06T15:09:24Z

Update: after conversation with Schol Comm, recommendation is to use a separate admin set to identify OA-published articles for harvesting by Primo.

dolsysmith added the enhancement label Apr 23, 2024

kilahimm added this to the 2.1.1 milestone Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAI metadata mapping for journal articles (for harvesting by Primo) #547

OAI metadata mapping for journal articles (for harvesting by Primo) #547

dolsysmith commented Apr 23, 2024

dolsysmith commented Aug 1, 2024 •

edited

Loading

dolsysmith commented Aug 2, 2024 •

edited

Loading

dolsysmith commented Aug 6, 2024

OAI metadata mapping for journal articles (for harvesting by Primo) #547

OAI metadata mapping for journal articles (for harvesting by Primo) #547

Comments

dolsysmith commented Apr 23, 2024

dolsysmith commented Aug 1, 2024 • edited Loading

General approach

dolsysmith commented Aug 2, 2024 • edited Loading

Technical notes

dolsysmith commented Aug 6, 2024

dolsysmith commented Aug 1, 2024 •

edited

Loading

dolsysmith commented Aug 2, 2024 •

edited

Loading