This repository hosts the template project used for the README2KG Shared Task hosted on Condabench.
The contents of this repository include scripts and data files used for the aforementioned competition:
- The dataset used for README2KG can be found in readme2kg_template/data
- List of scripts:
- The official scoring script src/scoring.py allows participants to evaluate their NER system locally before uploading the prediction to Codabench.
- src/TryMe.ipynb gives simple read / write examples for parsing and writing WebAnno TSV files.
- src/predictor.py gives a sample source code for writing predictions in WebAnno TSV format. Please note that an annotation may start and end in the middle of a token. It is also possible that an annotation spans more than one sentence.
- src/utils.py contains methods used by
predictor.py
. - src/webanno_tsv.py is adapted from neuged/webanno_tsv to handle reading and generation of the dataset used in the README2KG Shared Task.
We use poertry to manage our template project by default.
conda create --name readme poetry
poetry install
Or
use pip
to install dependencies
pip install -r requirements.txt
You can use the dummy predictor give in the template to generate the prediction
python src/predictor.py
To run the scoring script directly:
python src/scoring.py --reference_dir ./data/train --prediction_dir ./results/prediction
The scoring script is exactly the same as we used in codabench to evaluate the results.