Add support for FinMTEB benchmark #1379

alt-glitch · 2024-11-04T08:19:47Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: (FinMTEB Benchmark and Dataset:

Discussion: Add FinMTEB #1267
I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
  - Ran only on FiQAClassification as of now.
- intfloat/multilingual-e5-small
  - Ran only on FINAL as of now.
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

…etrieval, sts

alt-glitch · 2024-11-04T08:24:11Z

Hey @Muennighoff @KennethEnevoldsen @isaac-chung!

Here's a WIP PR to close #1267.

I had a few questions/notes:

Should I run and get the results for all the tasks?
Should the relevant PRs to embeddings-benchmark/results and embeddings-benchmark/leaderboard be made after merging this PR?
FiQA2018 is already in MTEB, so I have left that out from FinMTEB. Otherwise, there were no conflicting tasks.
Some tasks don't have a reference URL.
The Summarization tasks are still pending. I have yet to look into the changes highlighted by @yixuantt in Add FinMTEB #1267 for summarization.

I'll add the summarization changes and make the PRs to results and leaderboard once this is done.
Is there anything else I'm missing out on?

isaac-chung · 2024-11-04T09:39:09Z

Hi @alt-glitch , thanks for working on this!

Yes, I'd suggest running the whole thing on a small model mentioned in the paper like all-MiniLM-L12-v2, and only using the quickest settings as a sanity check, e.g. n_experiments=1 for classification.
Afterwards for the leaderboard yes. I'll leave the results repo part to @KennethEnevoldsen
Sounds good.
I think it's ok to use the paper's URL or its GitHub URL as reference. Otherwise, there are individual references for each dataset mentioned in the paper.
Re: summarization task, we can add column names as a class attributes to AbstaskSummarization like the way we did in MIEB's AbsTaskImageClassification.

Let me know if anything is unclear.

KennethEnevoldsen · 2024-11-04T19:38:40Z

Re. 2: PRs to embeddings-benchmark/results can be made after this PR. I don't believe a PR to embeddings-benchmark/leaderboard will be required once the new leaderboard is up and running as long as the benchmark is added to it is added to benchmarks.py

alt-glitch added 3 commits November 4, 2024 13:02

FinMTEB: classification, clustering, pairclassification, reranking, r…

585cdeb

…etrieval, sts

fix __init__

cd86c8c

fix references not being URLs

cdebb41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for FinMTEB benchmark #1379

Add support for FinMTEB benchmark #1379

alt-glitch commented Nov 4, 2024 •

edited

Loading

alt-glitch commented Nov 4, 2024 •

edited

Loading

isaac-chung commented Nov 4, 2024

KennethEnevoldsen commented Nov 4, 2024

Add support for FinMTEB benchmark #1379

Are you sure you want to change the base?

Add support for FinMTEB benchmark #1379

Conversation

alt-glitch commented Nov 4, 2024 • edited Loading

Checklist

Adding datasets checklist

alt-glitch commented Nov 4, 2024 • edited Loading

isaac-chung commented Nov 4, 2024

KennethEnevoldsen commented Nov 4, 2024

alt-glitch commented Nov 4, 2024 •

edited

Loading

alt-glitch commented Nov 4, 2024 •

edited

Loading