This repository is intended to provide a base framework and method for the PFL-DocVQA Competition.
Automatically managing the information of document workflows is a core aspect of business intelligence and process automation. Reasoning over the information extracted from documents fuels subsequent decision-making processes that can directly affect humans, especially in sectors such as finance, legal or insurance. At the same time, documents tend to contain private information, restricting access to them during training. This common scenario requires training large-scale models over private and widely distributed data.
Please, if you plan to participate in the Competition, read the participation instructions carefully.
To set up and use the framework please check How to use instructions.
If you want to download the dataset, you can do so in the ELSA Benchmarks Competition platform. For this framework, you will need to download the IMDBs (which contains processed QAs and OCR) and the images. All the downloads must be performed through the RRC portal.
Dataset | Link |
---|---|
PFL-DocVQA | Link |
Model | Weights HF name | Parameters |
---|---|---|
VT5 base | rubentito/vt5-base-spdocvqa | 316M |
Average Normalized Levenshtein Similarity (ANLS)
The standard metric for text-based VQA tasks (ST-VQA and DocVQA). It evaluates the method's reasoning capabilities while smoothly penalizes OCR recognition errors.
Check Scene Text Visual Question Answering for more details.
If you use this dataset or code, please cite our paper.
@article{tito2023privacy,
title={Privacy-Aware Document Visual Question Answering},
author={Rub{\`{e}}n Tito and Khanh Nguyen and Marlon Tobaben and Raouf Kerkouche and Mohamed Ali Souibgui and Kangsoo Jung and Joonas J{\"{a}}lk{\"{o}} and Vincent Poulain D'Andecy and Aur{\'{e}}lie Joseph and Lei Kang and Ernest Valveny and Antti Honkela and Mario Fritz and Dimosthenis Karatzas},
booktitle = {Document Analysis and Recognition - {ICDAR} 2024 - 18th International Conference, Athens, Greece, August 30 - September 4, 2024, Proceedings, Part {VI}},
series = {Lecture Notes in Computer Science},
volume = {14809},
pages = {199--218},
publisher = {Springer},
year = {2024},
url = {https://doi.org/10.1007/978-3-031-70552-6\_12},
doi = {10.1007/978-3-031-70552-6\_12}
}