Skip to content

Commit

Permalink
Updated retriever-lm-reasoning's paper info v2 (#249)
Browse files Browse the repository at this point in the history
* Added new thumbnail for paper

* Updated thumbnail's address

* Updated retriever-lm-reasoning's abstract
  • Loading branch information
ParishadBehnam authored Oct 23, 2023
1 parent a218b1d commit fbb580f
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions _posts/papers/2023-05-07-retriever-lm-reasoning.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ link: https://arxiv.org/abs/2212.09146
tags:
- EMNLP Findings
code: https://github.com/McGill-NLP/retriever-lm-reasoning
thumbnail: /assets/images/papers/retriever-lm-reasoning.jpg
thumbnail: /assets/images/papers/retriever-lm-reasoning-2.jpg
categories: Publications

---
Expand All @@ -21,4 +21,4 @@ categories: Publications

## Abstract

The emergence of large pretrained models has enabled language models to achieve superior performance in common NLP tasks, including language modeling and question answering, compared to previous static word representation methods. Augmenting these models with a retriever to retrieve the related text and documents as supporting information has shown promise in effectively solving NLP problems in a more interpretable way given that the additional knowledge is injected explicitly rather than being captured in the models' parameters. In spite of the recent progress, our analysis on retriever-augmented language models shows that this class of language models still lack reasoning over the retrieved documents. In this paper, we study the strengths and weaknesses of different retriever-augmented language models such as REALM, kNN-LM, FiD, ATLAS, and Flan-T5 in reasoning over the selected documents in different tasks. In particular, we analyze the reasoning failures of each of these models and study how the models' failures in reasoning are rooted in the retriever module as well as the language model.
Augmenting pretrained language models with retrievers has shown promise in effectively solving common NLP problems, such as language modeling and question answering. In this paper, we evaluate the strengths and weaknesses of popular retriever-augmented language models, namely kNN-LM, REALM, DPR + FiD, Contriever + ATLAS, and Contriever + Flan-T5, in reasoning over retrieved statements across different tasks. Our findings indicate that the simple similarity metric employed by retrievers is insufficient for retrieving all the necessary statements for reasoning. Additionally, the language models do not exhibit strong reasoning even when provided with only the required statements. Furthermore, when combined with imperfect retrievers, the performance of the language models becomes even worse, e.g., Flan-T5's performance drops by 28.6% when retrieving 5 statements using Contriever. While larger language models improve performance, there is still a substantial room for enhancement. Our further analysis indicates that multihop retrieve-and-read is promising for large language models like GPT-3.5, but does not generalize to other language models like Flan-T5-xxl.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit fbb580f

Please sign in to comment.