This is a new synchronized facsimile and transcription reader for the TEI files on clarin.dk.
It is a fork of the Glossematics source code with many changes made to TEI styling, metadata retrieval and page structure fitting these TEI files, which are quite different from the ones at https://glossematics.dk.
I downloaded the "everyman" dataset from https://repository.clarin.dk/repository/xmlui/handle/20.500.12115/46 and extracted every zip file.
The extracted TIF files were recursively converted and renamed using the following commands (taken from kuhumcst#20):
find . -name '*.tif' -exec mogrify -format jpg -quality 70 {} +
find . -name '*.jpg' -exec rename 's/(?<!.tif).jpg/.tif.jpg/g' {} +
And to remove the remaining TIF files:
find . -name "*.tif" -type f -exec rm -f {} \;
To create thumbnails for search results:
mkdir thumbs
find . -name '*.jpg' -exec convert '{}' -resize 360x640 -set filename:newname "%t.%e" 'thumbs/thumb-%[filename:newname]' \;
The directory /etc/clarin-tei
serves as the home directory of the system. The image and TEI files are to be found somewhere within the directory structure of /etc/clarin-tei/files
while this Git repository is cloned at /etc/clarin-tei/clarin-tei
.
The system requires Docker to run and is initialised as a systemd
service:
cp system/clarin-tei.service /etc/systemd/system/clarin-tei.service
systemctl enable clarin-tei
systemctl start clarin-tei
Currently, this system requires a separate reverse proxy to be available on the public Internet.
For e.g. an nginx setup such as the one running on alf.hum.ku.dk
, the following snippet should be included:
location /clarin {
include proxy_params;
proxy_pass http://127.0.0.1:6789/;
}
This will proxy requests to the CLARIN TEI web service running on localhost:6789.