-
Notifications
You must be signed in to change notification settings - Fork 118
Data Submission Handout
OpenAPC is an Open Data project on Open Access publishing charges, and all data is provided by academic institutions or funders on a voluntary basis. If you are reading this handout because you are considering to contribute data on behalf of your institution for the first time, we would like to thank you in advance - our project could not exist without that kind of dedication! Please take some time to familiarize yourself with the following guidelines. If anything is unclear, do not hesitate to ask (either via the Issue tracker or mail). And most important: Your contribution does not have to be perfect - we have a lot of experience and technical means to fix many issues on our side.
- The data contains an academic institution's expenditures on Open Access publishing on a per-publication basis.
- The data should be provided in a machine-readable, platform independent format (CSV).
- The data is provided under an Open Data Commons license to ensure public access and reusability.
- A contact person is designated at the contributing institution.
OpenAPC collects cost data on OA publishing for the following publication types:
- Journal articles (Article Processing Charges, APCs)
- Monographs/full books (Book Processing Charges, BPCs)
OA charges for other publication types (like single book chapters or conference proceedings) are not collected at the moment.
OpenAPC maintains different data sets for accepted publication types, they are composed of all the contributing institutions' distributed tables. Journal/book titles and publisher names are imported from CrossRef via automated enrichment routines to make expenditures comparable. Additional metadata is collected from services like Europe PubMed Central, the DOAJ or the DOAB.
The data is made available on GitHub.
Your submitted CSV file should conform to a certain data schema to ensure it provides all the information we need. Every schema field is represented by a table column and every publication record conforms to a table row. The schema to use depends on the type of cost data you plan to submit:
If you want to provide cost data on both publication types, we recommend to submit two different tables.
This contribution from Leipzig University is an example of a table which conforms to the schema.
The amount reported in the euro
field should be calculated according to the following policy:
- All reported publication fees are gross values, modifiers like taxes or discounts should always be included into the amount. Except for the mandatory
backlist_oa
field in the BPC data set, OpenAPC does not explicitly track special circumstances which might influence prices. However, institutions are encouraged to report details on such circumstances in a README file which can be added to their individual data folders (Example). - Only the APC/BPC itself should be reported, no additional matters of expense like page/colour charges or submission fees.
- If costs for a publication were split between multiple institutions, only one of them should report the full amount to OpenAPC.
- Some journals do levy additional fees for corrections to published articles (corrigenda). Such expenditures are not part of the APC and thus should neither be added to the reported costs nor added to the data table as as separate entry (in case a DOI was assigned to the corrigendum).
- Only articles which conform to a "standard" model of APC transactions should be reported (Direct payment of money for OA publication). If the cost was calculated in hindsight only (because the article was published under an offsetting contract oder paid for with vouchers like in the RSC "Gold for gold" program), it should not be reported.
- The cost should not be zero.
As the field name implies, the currency of the reported amount should be Euro (€). If your institution's accounting is based on another currency you can either convert the values yourself (preferable) or add the denomination and leave the process to us. However, in this case results might be slightly inaccurate as we will have to work with average exchange rates for the reported period. If you have information on the exact date of payment for each article you might want to add this information to the period
column (YYYY-MM-DD) instead of just the year so we can apply exchange rates on a daily basis.
There are two ways to provide OpenAPC with your data:
- By sending a mail to
openapc at uni-bielefeld.de
- By initiating a pull request on GitHub. This process works as follows:
The following steps are required if an institution wants to add data for the first time:
- Register a user account on GitHub (if you don't have one already)
- Create a fork of the original OpenAPC repository
- Clone the fork on your local machine
- Create a new folder for your institution in the
data
directory - Copy the data you want to add (tables, README) into the folder
- Push your changes back to GitHub
- Create a pull request and wait for the OpenAPC maintainer to accept it.
Steps 3 — 6 can be executed on your machine as follows (requires a command line environment and a git client):
Create local clone:
$ git clone https://github.com/YOURUsername/openapc-de.git
Add a new institutional folder:
$ cd openapc-de/data
$ mkdir YOURfolder
Add your csv
-table(s) and an optional README file to the folder:
$ cp YOURdata.csv openapc-de/data/YOURfolder
Add/commit the data and push it back to GitHub:
$ git add openapc-de/data/YOURfolder/
$ git commit -m "APC fees paid at my Institution from 2012 until 2014"
$ git push origin master
Finally create a pull request to make it possible for OpenAPC to include your new data.
The following steps are required to add or update data:
- Sync your fork and your local repository to mirror the current state of the OpenAPC repository (details)
- Make changes to the content of your institutional data folder
- Add, commit and push your changes back to GitHub
- Create a pull request and wait for the OpenAPC maintainer to accept it.
After receiving your files (Either by pull request or by mail), OpenAPC will normalise your data and enrich it. For every contributed data file an enriched version will be created and added to your data folder, usually marked by adding an _enriched
suffix to the file name. The enriched data will then be integrated into the OpenAPC core data set, increasing its revision number.
The enrichment process consists of the following steps:
- Journal and publisher names, ISSNs and license information are imported from CrossRef
- PMID and PMCID are imported from Europe PubMed Central
- The article is looked up in Web of Science, if found, the WoS identifier
ut
is stored - The journal is looked up in the DOAJ
- A possible Linking ISSN (ISSN-l) is added
At the moment the following license is applied to all OpenAPC data:
Datasets are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
All contributors to OpenAPC will be mentioned by name.
In addition to the dynamically generated repository front page (based on R Markdown) OpenAPC also operates an OLAP server for advanced data querying and a site providing treemap visualisations of the OpenAPC data.
Mit freundlicher Unterstützung der Arbeitsgruppe Elektronisches Publizieren der Deutschen Initiative für Netzwerkinformation (DINI), der Deutschen Forschungsgemeinschaft und dem Bundesministerium für Bildung und Forschung.
Inhalte sind lizenziert unter CC BY 4.0.
- Handreichung Dateneingabe (englisch)
- Mitmachen
- Daten zitieren
- Protokolle und Arbeitsstände
- Datenschema (englisch)
- Versionierung (englisch)
- Handreichung Dateneingabe Transformationsverträge (DEAL-Wiley) (englisch)
- Handreichung Dateneingabe Transformationsverträge (DEAL-Wiley und -Springer-Nature) ab Berichtsjahr 2020
- OAPK-Daten