Skip to content

Data Submission Handout

Christoph Broschinski edited this page Sep 26, 2016 · 21 revisions

Minimal requirements

  • The data contains an academic institution's expenditures on a per-article basis for publishing in fee-based Open Access journals
  • The data should be made available in a machine-readable, platform independent format (CSV).
  • The data is provided under an Open Data Commons license to ensure public access and reusability.
  • A contact person is designated at the contributing institution.

Data set

The complete OpenAPC data set is composed of all the contributing institutions' distributed tables. Journal titles and publisher names are imported from CrossRef via automated enrichment routines to make expenditures comparable. Additional metadata is collected from services like Europe PubMed Central or the DOAJ.

The data set is made available on GitHub.

Data schema

Every schema field is represented by a table column and every article conforms to a single table row.

The OpenAPC data schema is described here. This contribution from Leipzig University is an example of a table which conforms to the schema.

Mandatory fields

These variables must be present in every contribution:

institution — Top-level organisation which covered the fee

period — Year of APC payment

euro — The final amount that was paid in Euro, including VAT and all additional fees. The OpenAPC dataset does not explicitly track special reasons which might influence prices, like prepayment discounts, central billing agreements or individual waivers. However, institutions are encouraged to give details on such circumstances in a README file which can be added to their individual data folders (see below).

doi — Digital Object Identifier

is_hybrid — Should be TRUE if the article was published in a subscription-based Journal ('hybrid journal'), FALSE if the journal was fully Open Access.

Optional fields

If the article does not have a DOI assigned, these 4 fields have to be given as well:

publisher — The publisher

journal_full_title — Title of the journal

issn — International Standard Serial Number

url — A URL linking to the article full text

Submission

There are two ways to provide OpenAPC with your data:

  1. By sending a mail to openapc at uni-bielefeld.de
  2. By initiating a pull request on GitHub. This process works as follows:

GitHub workflow: Adding a new institution

The following steps are required if an institution wants to add data for the first time:

  1. Register a user account on GitHub (if you don't have one already)
  2. Create a fork of the original OpenAPC repository
  3. Clone the fork on your local machine
  4. Create a new folder for your institution in the data directory
  5. Copy the data you want to add (tables, README) into the folder
  6. Push your changes back to GitHub
  7. Create a pull request and wait for the OpenAPC maintainer to accept it.

Steps 3 — 6 can be executed on your machine as follows (requires a command line environment and a git client):

Create local clone:

$ git clone https://github.com/YOURUsername/openapc-de.git

Add a new institutional folder:

$ cd openapc-de/data
$ mkdir YOURfolder

Add your csv-table(s) and an optional README file to the folder:

$ cp YOURdata.csv openapc-de/data/YOURfolder

Add/commit the data and push it back to GitHub:

$ git add openapc-de/data/YOURfolder/
$ git commit -m "APC fees paid at my Institution from 2012 until 2014"
$ git push origin master

Finally create a pull request to make it possible for OpenAPC to include your new data.

GitHub workflow: Updating your data

The following steps are required to add or update data:

  1. Sync your fork and your local repository to mirror the current state of the OpenAPC repository (details)
  2. Make changes to the content of your institutional data folder
  3. Add, commit and push your changes back to GitHub
  4. Create a pull request and wait for the OpenAPC maintainer to accept it.

Enrichment

After receiving your files (Either by pull request or by mail), OpenAPC will normalise your data and enrich it. For every contributed data file an enriched version will be created and added to your data folder, usually marked by adding an _enriched suffix to the file name. The enriched data will then be integrated into the OpenAPC core data set, increasing its revision number.

The enrichment process consists of the following steps:

  • Journal and publisher names, ISSNs and license information are imported from CrossRef
  • PMID and PMCID are imported from Europe PubMed Central
  • The article is looked up in Web of Science, if found, the WoS identifier ut is stored
  • The journal is looked up in the DOAJ
  • A possible Linking ISSN (ISSN-l) is added

License

At the moment the following license is applied to all OpenAPC data:

Datasets are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/

Contributors

All contributors to OpenAPC will be mentioned by name.

Reuse

In addition to the dynamically generated repository front page (based on R Markdown) OpenAPC also operates an OLAP server for advanced data querying and a site providing treemap visualisations of the OpenAPC data.