utility module to ease the creation of ARCHE-RDF
This modules tries to ease the curation effort needed to describe a dataset of XML/TEI documents managed by a dsebaseapp as an ARCHE-RDF. Its main idea is to reuse as much existing metadata as possible and avoid any potential data duplication. The module consits of three main parts
- an XQuery module named
archeutils.xql
- several API endpoints for serialising ARCHE-RDF data
- a single configuration file for project/resource specific data
data/meta/arche_constants.rdf
Whereas the first two parts are generic and therefore provided as reusable module, the configuration file needs to customized for each dsebaseapp-project and is therefore NOT included in this module.
- add this repo as submodule to your dsebaseapp project
git submodule add https://github.com/KONDE-AT/dsebaseapp-archeutils.git archeutils
- create a document
data/meta/arche_constants.rdf
The XQuery module named archeutils.xql
exposes several variables needed to create an ARCHE-RDF fetched from
- the application structure
data/meta/arche_constants.rdf
The main entry point is the API-Endpoint archeutils/ids.xql
which returns a json with the following structure:
{
"arche_constants": "http://127.0.1.1:8080/exist/apps/thun/archeutils/dump-arche-cols.xql",
"id_prefix": {
"url": "https://id.acdh.oeaw.ac.at/thun"
},
"ids": [{
"id": "https://id.acdh.oeaw.ac.at/thun/editions/ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml",
"filename": "ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml",
"html": "http://127.0.1.1:8080/exist/apps/thun/pages/show.html?document=ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml&directory=editions",
"md": "http://127.0.1.1:8080/exist/apps/thun/archeutils/md.xql?id=ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml&collection=editions",
"payload": "http://127.0.1.1:8080/exist/apps/thun/resolver/resolve-doc.xql?doc-name=ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml&collection=editions",
"mimetype": "application/xml"
},
{
"id": "https://id.acdh.oeaw.ac.at/thun/editions/faller-an-thun-1859-01-31-a3-xxi-d494.xml",
"filename": "faller-an-thun-1859-01-31-a3-xxi-d494.xml",
"html": "http://127.0.1.1:8080/exist/apps/thun/pages/show.html?document=faller-an-thun-1859-01-31-a3-xxi-d494.xml&directory=editions",
"md": "http://127.0.1.1:8080/exist/apps/thun/archeutils/md.xql?id=faller-an-thun-1859-01-31-a3-xxi-d494.xml&collection=editions",
"payload": "http://127.0.1.1:8080/exist/apps/thun/resolver/resolve-doc.xql?doc-name=faller-an-thun-1859-01-31-a3-xxi-d494.xml&collection=editions",
"mimetype": "application/xml"
}
]
}
arche_constants
points to thearcheutils/dump-arche-cols.xql
endpoint which returns ARCHE-MD serialized in RDF/XML by callingarcheutils:dump_collections($cols)
- each object in the
ids
array represents an XML/TEI resource which should be ingested into arche. Themd
key points to an resource specific archeutils endpointarcheutils/md.xql?id={id/doc-name of the resource to ingest}
. The ARCHE-MD is generated byarcheutils/md.xql
which basically callsarcheutils:populate_tei_resource
limit={random-string}
list only 10 items, useful for testing the response of the endpoint as well as the actual ingestcustom_parent=true
use this if you'll have a custom collection structure (see more below)
(Ab)uses repo-schema to provide project specific data. E.g. thun-data/meta/arche_constants.rdf
in arche_constants.rdf
you can basically set three types of MD
- Hand-made or literal MD. This is needed for project specific information, like project-descriptions, defining project-related agents (like e.g. PIs or funding bodies
- Constants for either all collections/resources or dedicated collections/resources
- dynamic md-properties derived from the actual XML/TEI Documents. For this you'll need to provide a mapping using Xpath.
-
the TEI Mapping needs to be done by collection, whereas the matching collection needs to be defined in the
@collection
in the<acdh:TeiLookUps collection='name-of-collection'>
element -
the element name matches an arche-schema property
-
the
@type
value can either beliteral
-> the evaluated xpath expression becomes thetext()
of the elementliteral_no_lang
-> the evaluated xpath expression becomes thetext()
of the element but no default lang-attribute will be setno_eval
thetext()
will be copied into the arche-element (no need for this actually, as you can set constants on resource level anyway...)date
-> the elment get typed as date viardf:datatype="http://www.w3.org/2001/XMLSchema#date"
resource
-> the evaluated xpath expression is set as value for an@rdf:resource
resource_many
-> in case the evaluated xpath expression returns a sequence, than for each item in the sequence, a new element (i.e. rdf-triple) is created
-
to override the default language you can set a
@lang
parameter, e.g.<acdh:hasTitle type="literal" lang="und">normalize-space($item/tei:persName[1]/tei:forename/text()||' '||$item/tei:persName[1]/tei:surname/text())</acdh:hasTitle>
Sometimes the default eXist/dsebaseapp collection structure is not feasable for ARCHE. To circumvent this, you can pass a &custom-parent=true
URL-param to the ids.xql
endpoint. This will avoid the default behaviour of adding the default isPartOf
triple to any XML/TEI (which is the ARCHE-ID of its collection) but using the value defined in arche_constants.rdf
BUT be aware that you'll need to provide the ARCHE-MD for those custom collections yourself and you'll need to be able to generate the matching IDs through XPATH (or custom xquery functions) called in arche_constants.rdf
, e.g. something like:
<acdh:isPartOf type="resource">concat($item/@xml:base, '/', substring-before($item//tei:title[@type="iso-date"]/text(), '-'))</acdh:isPartOf>
- serializes person like entites
tei:person
returns something like:
<rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:acdh="https://vocabs.acdh.oeaw.ac.at/schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xml:base="https://id.acdh.oeaw.ac.at/">
<acdh:Person>
<acdh:hasIdentifier rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/abbondi-giorgio"/>
<acdh:hasTitle xml:lang="und">Giorgio de Abbondi</acdh:hasTitle>
</acdh:Person>
<acdh:Person>
<acdh:hasIdentifier rdf:resource="https://d-nb.info/gnd/118893106"/>
<acdh:hasTitle xml:lang="und">Abdülmecid I. (auch Abdul Mecid)</acdh:hasTitle>
</acdh:Person>
<acdh:Person>
<acdh:hasIdentifier rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/abraham-stefan"/>
<acdh:hasTitle xml:lang="und">Stefan Abraham</acdh:hasTitle>
</acdh:Person>
</rdf:RDF>
- The actual output is derived from the mapping in
arche_constants
:
<acdh:PersonLookUps source="indices/listperson.xml">
<acdh:hasIdentifier type="resource_many">archeutils:get_entity_id($item)</acdh:hasIdentifier>
<acdh:hasTitle type="literal" lang="und">normalize-space($item/tei:persName[1]/tei:forename/text()||' '||$item/tei:persName[1]/tei:surname/text())</acdh:hasTitle>
</acdh:PersonLookUps>
The function archeutils:get_entity_id($item)
checks if there is a tei:idno
with a textnode containing a string with 'd-nb.info'
, 'geonames'
or 'viaf'
and returns this text-node as ARCHE-ID. If not, a generic ARCHE-ID is constructed from the elements @xml:id
- same as for persons
- serializes the resources and their mentioned entities expressed in ARCHE-RDF
<rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:acdh="https://vocabs.acdh.oeaw.ac.at/schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xml:base="https://id.acdh.oeaw.ac.at/">
<acdh:Resource rdf:about="https://id.acdh.oeaw.ac.at/thun/editions/simor-an-thun-1854-12-01-a3-xxi-d296d.xml">
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/123271606"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/118757393"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/141265825"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/101780664"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/117619027"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/118594729"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/119459159"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/116016671"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/138333823"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/189010959"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/118787977"/>
<acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/henriques-de-carvalho-guilherme"/>
<acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/bonel-y-orbe-juan-jose"/>
<acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/kunszt-jozef"/>
<acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/scitovsky-jan"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/116106832"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/3169070/"/>
<acdh:hasSpatialCoverage rdf:resource="https://d-nb.info/gnd/4018145-5"/>
<acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_064b3fb95f9ed52eb2b1da3d5e807b17"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2921044/"/>
<acdh:hasSpatialCoverage rdf:resource="https://d-nb.info/gnd/4055964-6"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/719819/"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/3172395/"/>
<acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_6f1d35d511be7a1f29234d7dda06e2dd"/>
<acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_48e23d043764ef6b2d7d7acd9ac09860"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/4402265-7"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/1086824806"/>
</acdh:Resource>
<acdh:Resource rdf:about="https://id.acdh.oeaw.ac.at/thun/editions/memorandum-mikulas-neueinteilung-superintendenzen-1860-a3-xxi-d627.xml">
<acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/mikulas-johann"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/719819/"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/1086824806"/>
</acdh:Resource>
<acdh:Resource rdf:about="https://id.acdh.oeaw.ac.at/thun/editions/thun-an-ficker-1854-05-09-ca179.xml">
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/118757393"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/118532863"/>
<acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/scheffer-boichorst-auguste-amalia"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/119059312"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/118535013"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2761367/"/>
<acdh:hasSpatialCoverage rdf:resource="https://d-nb.info/gnd/4065781-4"/>
<acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_064b3fb95f9ed52eb2b1da3d5e807b17"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2775220/"/>
<acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2946447/"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/36150-1"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/36165-3"/>
<acdh:hasActor rdf:resource="http://d-nb.info/gnd/2024703-5"/>
</acdh:Resource>
</rdf:RDF>