Skip to content

Latest commit

 

History

History
47 lines (34 loc) · 2.02 KB

README.md

File metadata and controls

47 lines (34 loc) · 2.02 KB

Understanding Retrieval Augmentation for Long-Form Question Answering

This is the repository for the paper Understanding Retrieval Augmentation for Long-Form Question Answering.

Contents

  1. Requirements
  2. Collected Data
  3. Reproduction
  4. Citation

Requirements

Our code requires PyTorch (torch), HuggingFace Transformers (transformers) and the OpenAI API package (openai). Most of our experiments were run with torch==2.0.1, transformers==4.30.1 and openai==0.27.8 on Python 3.10.6.

Collected Data

The data folder contains:

  • questions with corresponding human and model answers,
  • evidence documents retrieved for each of the questions,
  • prompt templates used for creating the prompts that were passed to the models, and
  • human annotations of the attributability of each answer sentence to corresponding evidence documents, for a subset of the question and models.

Reproduction

Prompting LMs

Details for how to reproduce our prompting of the LMs are in src/answer_generation. Our setup is easily reusable with different questions, documents, prompts and/or models.

Attribution Prediction

We benchmark several approaches on attribution of answer sentences using collected data. The details can be found in src/Automatic/.

Retrieving Bing Documents

Steps for retrieving Bing evidence documents can be found in src/bing_search.

Citation

@article{chen2023understanding,
  title={Understanding Retrieval Augmentation for Long-Form Question Answering},
  author={Chen, Hung-Ting and Xu, Fangyuan and Arora, Shane A and Choi, Eunsol},
  journal={arXiv preprint arXiv:2310.12150},
  year={2023}
}