This is a custom Docker setup of Korp: a web interface (frontend and backend) for the eternal beta version of CWB (IMS Corpus Workbench). Two Docker images (korp_backend_base and korp_frontend_base) form the foundation of a selection of individual Korp setups maintained by the Department of Nordic Studies and Linguistics at the University of Copenhagen.
The original intent was to replace Clarin.dk's ailing instance of Korp with a more up-to-date, Dockerised version (this is now the Clarin setup).
NOTE: you will need to have
git
anddocker
installed on the host machine.
Deployment currently consists of:
- Cloning this repository on the production server.
- Building the base images using
docker-compose
. - Building and running a specific setup using
docker-compose
.
This will start (at least) two local Docker containers serving the frontend and backend. In some setups additional containers are also in use.
None of the setups come with any kind of SSL certificate support, thus it becomes necessary to install a reverse proxy/gateway in order to serve Korp as HTTPS. The production server of the Clarin setup (Alf) is currently located behind an nginx server which serves as its gateway to the public Internet.
NOTE: all commands must be run in a directory containing a
docker-compose.yml
file!It is likely that the filesystem permissions are set up in such a way that these commands must be run as a superuser, e.g by prepending them with
sudo
.
To build image(s) and store them in the local Docker repo, but not run them:
# building parent images
docker-compose build
This command is mostly just used to build the base images from inside the root directory. Typically, you will run this once (successfully) and then only rerun it when upstream changes to these base Docker configurations land on your branch.
To build and run a specific setup in a detached state with docker-compose
:
# building + running a setup
docker-compose up -d --build
Note: Omit the
-d
to get command-line feedback.
This is the most common way to build and run a setup.
Of course, in the real world, you will need to debug the system.
When something is misbehaving, get feedback by rerunning in a non-detached state. This means omitting the -d
option - and optionally the --build
option to skip the build step:
# running with command-line output
docker-compose up --build
# running with command-line output without rebuilding
docker-compose up
This is helpful in cases where the container keeps crashing and you want to see if some error message is written to standard output.
In certain cases you might want to restart - but not rebuild - a single container:
docker-compose restart backend
docker-compose restart frontend
This is needed if Dockerfile
/docker-compose.yml
are untouched, but changes still exist in files that are loaded dynamically.
Very often, you will need to enter a running container to see what it looks from the inside:
docker-compose exec backend bash
docker-compose exec frontend bash
In some cases you just want to kill stuff (perhaps you're running multiple Docker containers that are interfering with each other):
# Stop the running container(s)
docker-compose stop
# If you want to prune the volumes too
docker-compose down --remove-orphans --volumes
The setups directory contains custom setups used in different contexts. As an example, the setup in setups/clarin is the Korp instance available at clarin.dk.
A Dockerfile file cannot extend another Dockerfile, only Docker images. These images can either be pulled from a Docker repository or pulled from the local repository on your machine. Once built, the image is globally available and can be used as a parent image of another Dockerfile, e.g. an example Dockerfile might extend the frontend base:
FROM korp_frontend_base
CMD yarn start:dist
However, before we can create Dockerfiles that extend korp_frontend_base
or korp_backend_base
we must first build these images. That is accomplished by running docker-compose build
inside the root directory where the docker-compose.yml
creating the base images is located.
Korp uses AngularJS to generate HTML in the frontend (note that this is different from Angular). In some cases you will also run into JSP syntax, e.g. in some of the files located in modes
. To edit these templates strings you will thus need to familiarise yourself with both template languages used.
In the doc/userguides/docs
directory, we maintain a general Korp user guide that can be built into a static subsite with the mkdocs program. There are a number of markdown files; index.md and regex.md constitute a general user guide with a separate page on regular expression search. The other pages are only relevant to specific setups and can be added or removed as needed using Docker commands.
The doc
directory has its own Dockerfile, which will build an image containing the userguides
directory. This is then copied in by the frontend images of the individual setups to make the user guide available in each setup. The frontend Dockerfile takes care of removing pages not relevant to the setup and adding a relevant navigation menu before building the user guide using mkdocs.
A user guide link in the header has been added as part of the frontend base image in korp-setups/frontend/app/includes/header.pug
.
In case only the backend service needs to be started, make sure the relevant docker-compose.yml
does not start a frontend container, e.g. by commenting out the relevant code or by using a docker-compose.yml
that doesn't explicitly start the Korp frontend. The frontend can then be run independently, e.g. from a checked out korp-frontend
git repo, using the following commands:
yarn
yarn build
yarn start:dist
The config.js
must contain a valid korpBackendURL
endpoint. For the CST production system this value is https://alf.hum.ku.dk/korp/backend
, but when running the system locally it must refer to localhost:1234
instead. Changing this value requires explicitly restarting the affected container; rebuilding will not be enough as only changing config.js
means the image is still identical.
The ideal development loop consists of running the backend service in a Docker container, while making changes to the frontend project and then copying the relevant files to frontend/app/ in this project. For production systems, the frontend Dockerfile should point to the latest stable git commit and the docker-compose.yml
in this outer directory should be used to run the full system (backend + frontend services).
The documentation for the korp-frontend also specifies that one should put a run_config.json in the project directory to specify where any custom configuration files are located. Despite its name, this file is not actually used for runtime configuration, but actually contains configurations needed to run the yarn build
command. We have not found a need for run_config.json
that wasn't better served by simply patching the app/
directory recursively.