Coverage is a project for visualizing the status of digital data archiving efforts across various data repositories run by different initiatives. Its current scope covers data within the epa.gov top-level domain.
This code repo provides the JSON back-end: https://api.archivers.co/coverage
The datatogether/webapp
repo provides the visual front-end: https://archivers.co/coverage
Copyright (C) 2017 Data Together
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3.0.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the LICENSE
file for details.
Actual source datasets can be found in each /repositories/*
directory. It currently includes the following:
- Archivers 2
- archivers.space
- EDGI Nomination Tool Uncrawlables
- The Internet Archive
- Project Svalbard JSON-LD crawl
Requests for new data repositories are tracked under the data-repository
issue label.
It takes a list of urls and associated archiving information, and turns that into a tree of url paths with associated coverage information.
The output is cached in cache.json
. Because this is a large file, we provide incremental pieces of the cached tree as a web server. To dynamically calculate coverage completion to can work with the cache.json
file.
/healthcheck
- server status/repositories
- list all data repositories ❓/repositories/:repository_uuid
- get details for a single data repository ❓/fulltree
- get full coverage tree of url-based resources/tree
- get scope-able coverage tree ❓/coverage
- get coverage summary (not currently used)
We would love involvement from more people! If you notice any errors or would like to submit changes, please see our Contributing Guidelines.
We use GitHub issues for tracking bugs and feature requests and Pull Requests (PRs) for submitting changes
Running this project can be done either directly on your workstation system, or in a "container" via Docker.
For people comfortable with Docker, or who are excited to learn about it, it can be the best way to get going.
Running this project via Docker requires:
Running the project in a Docker container should be as simple as:
make setup
make run
If you get an error about a port "address already in use", you can change the PORT
environment variable in your local .env
file.
Barring any changes, you may now visit a JSON endpoint at: http://localhost:8080/repositories
Running this project directly on your system requires:
- Go 1.7+
- Postgresql
(Setting up these services is outside the scope of this README.)
cd path/to/coverage
createdb datatogether_coverage
go build
go get ./
# Set a free port on which to serve JSON
export PORT=8080
# Your postgresql instance may be running on a different port
export POSTGRES_DB_URL=postgres://localhost:5432/datatogether_coverage
$GOPATH/bin/coverage
Barring any changes, you may now visit a JSON endpoint at: http://localhost:8080/repositories
Please follow the install instructions above! Inclusion of tests are appreciated!
For a list of all availabe helper commands, just type make
.