Updated retriever-lm-reasoning's paper info v2 (#249)

* Added new thumbnail for paper * Updated thumbnail's address * Updated retriever-lm-reasoning's abstract
McGill-NLP · Oct 23, 2023 · fbb580f · fbb580f
1 parent a218b1d
commit fbb580f
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/_posts/papers/2023-05-07-retriever-lm-reasoning.md b/_posts/papers/2023-05-07-retriever-lm-reasoning.md
@@ -8,7 +8,7 @@ link: https://arxiv.org/abs/2212.09146
 tags:
 - EMNLP Findings
 code: https://github.com/McGill-NLP/retriever-lm-reasoning
-thumbnail: /assets/images/papers/retriever-lm-reasoning.jpg
+thumbnail: /assets/images/papers/retriever-lm-reasoning-2.jpg
 categories: Publications
 
 ---
@@ -21,4 +21,4 @@ categories: Publications
 
 ## Abstract
 
-The emergence of large pretrained models has enabled language models to achieve superior performance in common NLP tasks, including language modeling and question answering, compared to previous static word representation methods. Augmenting these models with a retriever to retrieve the related text and documents as supporting information has shown promise in effectively solving NLP problems in a more interpretable way given that the additional knowledge is injected explicitly rather than being captured in the models' parameters. In spite of the recent progress, our analysis on retriever-augmented language models shows that this class of language models still lack reasoning over the retrieved documents. In this paper, we study the strengths and weaknesses of different retriever-augmented language models such as REALM, kNN-LM, FiD, ATLAS, and Flan-T5 in reasoning over the selected documents in different tasks. In particular, we analyze the reasoning failures of each of these models and study how the models' failures in reasoning are rooted in the retriever module as well as the language model.
+Augmenting pretrained language models with retrievers has shown promise in effectively solving common NLP problems, such as language modeling and question answering. In this paper, we evaluate the strengths and weaknesses of popular retriever-augmented language models, namely kNN-LM, REALM, DPR + FiD, Contriever + ATLAS, and Contriever + Flan-T5, in reasoning over retrieved statements across different tasks. Our findings indicate that the simple similarity metric employed by retrievers is insufficient for retrieving all the necessary statements for reasoning. Additionally, the language models do not exhibit strong reasoning even when provided with only the required statements. Furthermore, when combined with imperfect retrievers, the performance of the language models becomes even worse, e.g., Flan-T5's performance drops by 28.6% when retrieving 5 statements using Contriever. While larger language models improve performance, there is still a substantial room for enhancement. Our further analysis indicates that multihop retrieve-and-read is promising for large language models like GPT-3.5, but does not generalize to other language models like Flan-T5-xxl.
diff --git a/assets/images/papers/retriever-lm-reasoning-2.jpg b/assets/images/papers/retriever-lm-reasoning-2.jpg