Ideas on Conversational Retrieval Task Category #663

vaibhavad · 2024-05-10T18:01:45Z

vaibhavad
May 10, 2024
Collaborator

I am interested in adding Conversational Retrieval tasks to MTEB. To get started, the two datasets that I am interested in adding are

Conversational retrieval involves different challenges compared to other tasks in MTEB. In particular, the query is a list of utterances from conversation history, instead of a single sentence. Here is an example fro QReCC:

[
    "What can you tell me about Gary Cherone?",
    "Gary Francis Caine Cherone is an American rock singer and songwriter, known for his work as the lead vocalist of Extreme and for his short stint for Van Halen.",
    "Did he sing well?",
    "Yes, Gary Cherone is also known for his work as the lead vocalist of the Boston rock group Extreme."
]

Furthermore, how to convert the conversation history to a single input string for the model is in itself an interesting and relevant research direction. The most basic way is to just concatenate all the turns. Other approaches have also explored using a separate query rewriting model to convert conversation history to a single decontextualized question.

Ideally, the submitters should implement the logic of converting conversation history to input string, however, this will introduce breaking changes. So, here is what I propose:

MTEB package contains a naive implementation of converting conversation history to input string, say concatenating all turns using a specific template. Let's call this function conv_history_to_text.
If the submission model implements a conv_history_to_text function, the MTEB package uses that implementation to convert conversation history to text before calling encode, otherwise it uses the default implementation.

Does this sound reasonable? If so, I can get started on Conversation Retrieval task category.

KennethEnevoldsen · 2024-05-11T09:27:38Z

KennethEnevoldsen
May 11, 2024
Maintainer

Thanks for starting the discussion @vaibhavad.

I believe a simpler approach than implementing the conv_history_to_text is to allow the encode method to take a full conversation either using the current encode (which I probably wouldn't do as we want to keep it compatible with SentenceTrf) or using an encode_conversations. This way we could also imagine different approaches (e.g. encoding each line independently and taking the mean).

Otherwise, I believe the rest of the proposal sounds reasonable.

6 replies

sivareddyg May 14, 2024
Collaborator

Hi everyone, keeping the implementation issues aside, it would be useful to have "conversational retrieval" as a separate task category. The reason is that more and more real world applications need conversational retrieval than isolated question answering scenarios. Moreover, some tasks like FaithDial are not even question answering but knowledge-grounded conversations which has multi-turn aspects. It becomes hard to summarize all that into single query. I also anticipate because of the chat-abilities of LLMs, this is the category that needs more work in the future.

So it would be useful to distinction from normal retrieval task and call this "conversational retrieval".

orionw May 14, 2024
Maintainer

Thanks @sivareddyg! Agree that it should be a separate category -- once it's added to MTEB you can make a tab similar to what was previously done for legal retrieval. We have plans to change the leaderboard to allow for selecting these options instead of tabs (as the tabs are kinda busy rn), so it can also migrate to that dropdown as a separate category when we switch over also.

My previous comment is just in terms of avoiding code duplication since conversational retrieval follows the same structure as retrieval (e.g. it has a corpus, queries, qrels) and thus can share the same main retrieval class code as long as we handle the case when the queries are a different format (e.g. potentially multi-turn, history, etc.)

vaibhavad May 14, 2024
Collaborator Author

Thanks all for the discussion, I am implementing the Conversational Retrieval task category and I'll start a PR soon. I'll re-use retrieval code wherever possible.

orionw May 14, 2024
Maintainer

Awesome, thanks @vaibhavad!

Sorry if the above was vague, I think there's only a few lines of code that need to changed to enable this:

Option 1

In the RetrievalEvaluator class you can add to line 412 something like:

# if it is a conversational retrieval dataset
if type(queries[queries.keys()[0])) == list: # or a dict or whatever it is, anything except a string
    # encode the conversational history here
    # whatever you think the name of the function should be
    queries = self.retriever.encode_conversation(corpus, queries, ... whatever else is needed)

After these lines, the evaluation can proceed as if it was a standard retrieval dataset since queries will be a Dict[str, str]

Lastly we can add the naive implementation of encode_conversation to DenseRetrievalExactSearch and DRESModel like what was done for the custom cross-encoder functionality except filled it in for the naive case. This way the conversational retrieval datasets should work out of the box (e.g. from the command line).

If someone wishes to extend it and provide more custom ways of handling the history (like a query rewriter) they simply extend the DenseRetrievalExactSearch class and add a new encode_conversation function.

Option 2

Is what @KennethEnevoldsen was suggesting, correct me if this is wrong.

Change the call function to this:

    def __call__(
        self, corpus: dict[str, dict[str, str]], queries: dict[str, str]
    ) -> dict[str, dict[str, float]]:
        if not self.retriever:
            raise ValueError("Model/Technique has not been provided!")
        
        is_conversational_dataset = type(queries[queries.keys()[0])) == list
        if is_conversational_dataset: 
            return self.retriever.search_with_history(corpus, queries, self.top_k, self.score_function)
        elif self.is_cross_encoder:
            return self.retriever.search_cross_encoder(corpus, queries, self.top_k)
        else:
            return self.retriever.search(
                corpus, queries, self.top_k, self.score_function
            )

Then you can implement a naive function in DenseRetrievalExactSearch and DRESModel that performs this, copying most of it from the existing search function.

I think these are very similar but Option 2 requires duplicating most of the search functionality, but leaves more flexibility. Option 2 allows for the custom search function (perhaps a mean of all the subqueries) whereas Option 1 allows it to work with both cross-encoders and bi-encoders.

Does this make sense or am I missing something about the way these conversational retrieval datasets work?

vaibhavad May 14, 2024
Collaborator Author

I started out with my own implementation before I saw Option 2. You can view it at #714. It modifies the search function of DenseRetrievalExactSearch and adds encode_conversations method to DRESModel model which first looks for encode_conversations function of the user model, otherwise defaults to concatenating conversation history and passing it to encode_queries.

I think the only missing part is that if the user model is DRES compatible, then it might throw an error. Let me look into how to resolve that.
EDIT: Ah! I can just add the same encode_conversations to DenseRetrievalExactSearch

Feel free to continue this conversation on #714 PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas on Conversational Retrieval Task Category #663

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Ideas on Conversational Retrieval Task Category #663

vaibhavad May 10, 2024 Collaborator

Replies: 1 comment · 6 replies

KennethEnevoldsen May 11, 2024 Maintainer

sivareddyg May 14, 2024 Collaborator

orionw May 14, 2024 Maintainer

vaibhavad May 14, 2024 Collaborator Author

orionw May 14, 2024 Maintainer

Option 1

Option 2

vaibhavad May 14, 2024 Collaborator Author

vaibhavad
May 10, 2024
Collaborator

Replies: 1 comment 6 replies

KennethEnevoldsen
May 11, 2024
Maintainer

sivareddyg May 14, 2024
Collaborator

orionw May 14, 2024
Maintainer

vaibhavad May 14, 2024
Collaborator Author

orionw May 14, 2024
Maintainer

vaibhavad May 14, 2024
Collaborator Author