A pytorch implementation of the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions (DrQA). The code is based on Runqi's implementation (https://github.com/hitvoice/DrQA).
- python >=3.5
- pytorch 0.2.0
- numpy
- pandas
- msgpack
- spacy 1.x
- cupy
- pynvrtc
- make sure python 3 and pip is installed.
- install pytorch matched with your OS, python and cuda versions.
- install the remaining requirements via
pip install -r requirements.txt
- download the SQuAD datafile, GloVe word vectors and Spacy English language models using
bash download.sh
.
# prepare the data
python prepro.py
# make sure CUDA lib path can be found, e.g.:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
# specify the path to find SRU implementation, e.g.:
export PYTHONPATH=../../sru/
# train for 50 epoches with batchsize 32
python train.py -e 50 -bs 32
EM | F1 | Time used in RNN | Total time/epoch | |
---|---|---|---|---|
LSTM (original paper) | 69.5 | 78.8 | ~523s | ~700s |
SRU (this version) | 70.3 | 79.5 | ~88s | ~200s |
Tested on GeForce GTX 1070.
Author of the Document Reader model: Danqi Chen.
Author of the original Pytorch implementation: Runqi Yang.
Most of the pytorch model code is borrowed from Facebook/ParlAI under a BSD-3 license.