My project is called Question Answering. This is a project carried out by me when I was studying at VietAI Advanced NLP Class 02. In a nutshell, the system in this project helps us answer a Question of a given Context.
To get started, you should have prior knowledge on Python and Pytorch at first. A few resources to get you started if this is your first Python or Tensorflow project:
-
Data: UIT-ViQuAD2.0 dataset from VLSP2021.
-
Model:
question_answering_bartpho_phobert
is based on BARTpho and PhoBERT models.
According to the orginal paper, it is stated that BARTpho-syllable and BARTpho-word are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. BARTpho uses the "large" architecture and the pre-training scheme of the sequence-to-sequence denoising autoencoder BART, thus it is especially suitable for generative NLP tasks. Especially in this downstream task, based on our experiments, we choose BARTpho-syllable in preference to BARTpho-word, and PhoBERT-large in preference to PhoBERT-base.
-
Clone the repo
git clone https://github.com/phkhanhtrinh23/question_answering_bartpho_phobert.git
-
Use any code editor to open the folder question_answering_bartpho_phobert.
-
Run
pip install -r requirements.txt
to install the required packages.
Note: You can install transformer as follows:
git clone --single-branch --branch fast_tokenizers_BARTpho_PhoBERT_BERTweet https://github.com/datquocnguyen/transformers.git
cd transformers
pip3 install -e .
- After you have received the permission to download and use UIT-ViQuAD2.0, the structure of the dataset should be as follows:
├── data
| └── demo.json (not from UIT-ViQuAD2.0)
| └── test.json
| └── train.json
-
Run
python data.py
to split thetrain.json
intonew_train.json
andvalid.json
with 9:1 ratio respectively. -
Now you can easily train the model with this command
python train.py
. -
You can validate the model by
python validate.py
. This file validates the score of the trained model based onvalid.json
Note: Of course, you can parse any arguments given in the ArgumentParser
in both train.py
and validate.py
for better results.
- You can infer and evaluate the results of
test.json
bypython inference.py
.
Note: Because the model cannot load and infer the whole dataset at once, validate.py
and inference.py
only supports inferring in batches.
- SHOW TIME! Now you can run your own demo website by using Flask
python api.py
. The UI of the website is originated fromtemplates
folder. If possible, run this and share your results with me!
Some results:
Image 1 (from BARTPho-syllable)
Image 2 (from PhoBERT-large)
Image 3 (from PhoBERT-large)
Contributions are what make GitHub such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the project
- Create your Contribute branch:
git checkout -b contribute/Contribute
- Commit your changes:
git commit -m 'add your messages'
- Push to the branch:
git push origin contribute/Contribute
- Open a pull request
Email: [email protected]