Cross-Lingual Data Augmentation For Thai QA

Authors: Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat

To be presented at GenBench in EMNLP 2023:

ACL Link: PDF
ResearchGate Link: PDF

TLDR: This paper introduces an innovative data augmentation framework with quality control measures to enhance the robustness of Thai question answering models.

This paper presents an innovative data augmentation framework with data quality control designed to enhance the robustness of Question Answering (QA) models in low-resource languages, particularly Thai. Recognizing the challenges posed by the scarcity and quality of training data, we leverage data augmentation techniques in both monolingual and cross-lingual settings. Our approach augments and enriches the original dataset, thereby increasing its linguistic diversity and robustness. We evaluate the robustness of our framework on Machine Reading Comprehension, and the experimental results illustrate the potential of data augmentation to effectively increase training data and improve model generalization in low-resource language settings, offering a promising direction for the data augmentation manner.

Dataset

Publicly available at: https://huggingface.co/datasets/parinzee/claq-qa-thai-dataset

Models

Coming Soon

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
contexts		contexts
images		images
questions		questions
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark_qa.py		benchmark_qa.py
benchmark_reqa.py		benchmark_reqa.py
benchmark_reqa_results.csv		benchmark_reqa_results.csv
compute_qa_sig.py		compute_qa_sig.py
delete_repos.py		delete_repos.py
environment.yml		environment.yml
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Lingual Data Augmentation For Thai QA

Authors: Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat

Dataset

Models

Special Thanks

About

Releases

Packages

Languages

License

parinzee/cross-lingual-data-augmentation-for-thai-qa

Folders and files

Latest commit

History

Repository files navigation

Cross-Lingual Data Augmentation For Thai QA

Authors: Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat

Dataset

Models

Special Thanks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages