Skip to content

Testing

Adrian Turner edited this page Jun 28, 2023 · 16 revisions

Metadata Fetcher

Test that the metadata fetcher runs (unit tests):

  • should save an API response if it gets one
  • should communicate an error message and move on to the next page (if possible) if it's unable to get an API response
  • should communicate an error message and quit if it's unable to get an API response and unable to move on to the next page

Test that the metadata fetcher works within the Rikolti system (integration tests):

  • can be triggered by the Collection Registry
  • can be triggered manually

Test that the metadata fetcher is retrieving accurate pages (data fidelity/acceptance tests):

  • not sure how we test this, maybe for each fetcher type we select a rather stagnant collection and assert that the content is the same?
  • create some sample of collections for which we can know what the page looks like

Metadata Mapper

Test that the metadata mapper runs (unit tests):

  • should save a set of mapped and enriched records
  • should save a set of just mapped records if the appropriate flag is set
  • should communicate an error message if it's unable to parse, map, or enrich a certain record or page of records

Test that the metadata mapper works within the Rikolti system (integration tests):

  • can be triggered by the Collection Registry
  • can be triggered manually

Test that the metadata mapper is creating accurate data records (data fidelity/acceptance tests):

  • Develop against a relatively small sample of collections
  • Test against a larger sample of collections for which we can know what the mapped data looks like
  • Finalize against the entire set of collections in Solr (understanding that there will be "natural" data drift due to updates from the source institution)

Data fidelity/acceptance testing criteria for migrated mappers

QA checking of the new mapper involves comparing the Solr-based representation of the metadata produced from the existing mapper/enrichment code vs. the ElasticSearch-based representation of the metadata produced from the new mapper code.

The [metadata_mapper/validate_mapping.py](https://github.com/ucldc/rikolti/blob/main/metadata_mapper/validate_mapping.py)script generates a csv report that specifies discrepancies between our existing legacy Solr index and the Rikolti mapper output according to the following data fidelity priorities. We consider the level of data fidelity specified below to be the minimum viable data fidelity for considering Rikolti mappers sufficiently migrated from the legacy system to Rikolti.

1) Baseline metadata

We require 100% fidelity in mapper outputs for the following Calisphere metadata schema elements, in terms of the format of the alphanumeric value or text string, and cases where the data is repeated (e.g., multiple identifiers). If there is no existing or relevant mappable data in the source record, the new mapper should output the metadata field only (with an empty data value):

  • id: String. Not required. This metadata field is strictly reserved for ARK identifiers, in the following format: ark:/#####/######

  • identifier: Array of strings. Not required.

  • isShownAt: Array containing a single string (this will be a URL to a "landing page" for the object, at the harvested endpoint). Required.

  • isShownBy: Array containing a single string (this will be a URL to a thumbnail image file). Required.

  • title: Array of strings. Required.

  • type: Array containing one string element, limited to values below. Required (note that we will have records that lack this data; in these cases, we will need to identify and supply a default data value – e.g., “image”).

    • collection
    • dataset
    • event
    • image
    • interactive resource
    • moving image
    • physical object
    • service
    • software
    • sound
    • text
  • rights: Array of strings -OR- rights_uri: Array containing a single string. One of these two fields is required.

2) Additional metadata

We are aiming for 100% fidelity in mapper outputs for the following additional Calisphere metadata schema elements, in terms of the format of the alphanumeric value or text string, and cases where the data is repeated (e.g., multiple identifiers). If there is no existing or relevant mappable data in the source record, the new mapper should output the metadata field only (with an empty data value):

  • alternative_title
  • contributor
  • coverage
  • creator
  • date
  • description
  • extent
  • format
  • genre
  • language
  • location
  • provenance
  • publisher
  • relation
  • rights_holder
  • rights_note
  • rights_date
  • source
  • spatial
  • subject
  • temporal
  • transcription

However, given potential variances and version changes to the originally-harvested metadata in Solr vs. more recently fetched metadata used to develop the new mapper code, up to 3 variances of the following types are acceptable, before submitting the code for CDL review:

  • Changes in the number of elements in an array. E.g., multiple identifiers may have been added to the source metadata record, since originally harvesting it into Solr. Alternatively, the contributor may have deleted/removed an identifier from the source record, since originally harvesting it into Solr.
  • Changes in value for a given element. E.g., a rights statement in rights may have been updated by the contributor, since originally harvesting it into Solr.