This repository is an archive of work done with the CORD-19 challenge in 2020. If you'd like to programatically process medical literature, see paperai
COVID-19 Open Research Dataset (CORD-19) is a free resource of scholarly articles, aggregated by a coalition of leading research groups, covering COVID-19 and the coronavirus family of viruses. The dataset can be found on Semantic Scholar and Kaggle.
The cord19q project builds an index over the CORD-19 dataset to assist with analysis and data discovery. A series of COVID-19 related research topics were explored to identify relevant articles and help find answers to key scientific questions.
A full list of Kaggle CORD-19 Challenge tasks can be found in this notebook. This notebook and corresponding report notebooks won 🏆 7 awards 🏆 in the Kaggle CORD-19 Challenge.
The latest tasks are also stored in the cord19q repository.
cord19q can be installed directly from GitHub using pip. Using a Python Virtual Environment is recommended.
pip install git+https://github.com/neuml/cord19q
Python 3.6+ is supported
cord19q relies on paperetl to parse and load the CORD-19 dataset into a SQLite database. paperai is then used to run an AI-Powered Literature Review over the CORD-19 dataset for a list of query tasks.
The following links show how to parse, load and index CORD-19.
The model will be stored in ~/.cord19
A report file is simply a markdown file created from a list of queries. An example:
python -m paperai.report tasks/risk-factors.yml
Once complete a file named tasks/risk-factors.md will be created.
The fastest way to run queries is to start a paperai shell
paperai
A prompt will come up. Queries can be typed directly into the console.