Testing

Metadata Fetcher

Test that the metadata fetcher runs (unit tests):

should save an API response if it gets one
should communicate an error message and move on to the next page (if possible) if it's unable to get an API response
should communicate an error message and quit if it's unable to get an API response and unable to move on to the next page

Test that the metadata fetcher works within the Rikolti system (integration tests):

can be triggered by the Collection Registry
can be triggered manually

Test that the metadata fetcher is retrieving accurate pages (data fidelity/acceptance tests):

not sure how we test this, maybe for each fetcher type we select a rather stagnant collection and assert that the content is the same?
create some sample of collections for which we can know what the page looks like

Metadata Mapper

Test that the metadata mapper runs (unit tests):

should save a set of mapped and enriched records
should save a set of just mapped records if the appropriate flag is set
should communicate an error message if it's unable to parse, map, or enrich a certain record or page of records

Test that the metadata mapper works within the Rikolti system (integration tests):

can be triggered by the Collection Registry
can be triggered manually

Test that the metadata mapper is creating accurate data records (data fidelity/acceptance tests):

Develop against a relatively small sample of collections
Test against a larger sample of collections for which we can know what the mapped data looks like
Finalize against the entire set of collections in Solr (understanding that there will be "natural" data drift due to updates from the source institution)

Data fidelity/acceptance testing criteria for migrated mappers

QA checking of the new mapper involves comparing the Solr-based representation of the metadata produced from the existing mapper/enrichment code vs. the ElasticSearch-based representation of the metadata produced from the new mapper code.

The [metadata_mapper/validate_mapping.py](https://github.com/ucldc/rikolti/blob/main/metadata_mapper/validate_mapping.py)script generates a csv report that specifies discrepancies between our existing legacy Solr index and the Rikolti mapper output according to the following data fidelity priorities. We consider the level of data fidelity specified below to be the minimum viable data fidelity for considering Rikolti mappers sufficiently migrated from the legacy system to Rikolti.

1) Baseline metadata

We require 100% fidelity in mapper outputs for the following Calisphere metadata schema elements, in terms of the format of the alphanumeric value or text string, and cases where the data is repeated (e.g., multiple identifiers). If there is no existing or relevant mappable data in the source record, the new mapper should output the metadata field only (with an empty data value):

id: String. Not required. This metadata field is strictly reserved for ARK identifiers, in the following format: ark:/#####/######
identifier: Array of strings. Not required.
isShownAt: Array containing a single string (this will be a URL to a "landing page" for the object, at the harvested endpoint). Required.
isShownBy: Array containing a single string (this will be a URL to a thumbnail image file). Required.
title: Array of strings. Required.
type: Array containing one string element, limited to values below. Required (note that we will have records that lack this data; in these cases, we will need to identify and supply a default data value – e.g., “image”).
- collection
- dataset
- event
- image
- interactive resource
- moving image
- physical object
- service
- software
- sound
- text
rights: Array of strings -OR- rights_uri: Array containing a single string. One of these two fields is required.

2) Additional metadata

We are aiming for 100% fidelity in mapper outputs for the following additional Calisphere metadata schema elements, in terms of the format of the alphanumeric value or text string, and cases where the data is repeated (e.g., multiple identifiers). If there is no existing or relevant mappable data in the source record, the new mapper should output the metadata field only (with an empty data value):

alternative_title
contributor
coverage
creator
date
description
extent
format
genre
language
location
provenance
publisher
relation
rights_holder
rights_note
rights_date
source
spatial
subject
temporal
transcription

However, given potential variances and version changes to the originally-harvested metadata in Solr vs. more recently fetched metadata used to develop the new mapper code, up to 3 variances of the following types are acceptable, before submitting the code for CDL review:

Changes in the number of elements in an array. E.g., multiple identifiers may have been added to the source metadata record, since originally harvesting it into Solr. Alternatively, the contributor may have deleted/removed an identifier from the source record, since originally harvesting it into Solr.
Changes in value for a given element. E.g., a rights statement in rights may have been updated by the contributor, since originally harvesting it into Solr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly