Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for FinMTEB benchmark #1379

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alt-glitch
Copy link

@alt-glitch alt-glitch commented Nov 4, 2024

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: (FinMTEB Benchmark and Dataset:

  • Discussion: Add FinMTEB #1267

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.

    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
      • Ran only on FiQAClassification as of now.
    • intfloat/multilingual-e5-small
      • Ran only on FINAL as of now.
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).

  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()

  • I have filled out the metadata object in the dataset file (find documentation on it here).

  • Run tests locally to make sure nothing is broken using make test.

  • Run the formatter to format the code using make lint.

@alt-glitch
Copy link
Author

alt-glitch commented Nov 4, 2024

Hey @Muennighoff @KennethEnevoldsen @isaac-chung!

Here's a WIP PR to close #1267.

I had a few questions/notes:

  1. Should I run and get the results for all the tasks?
  2. Should the relevant PRs to embeddings-benchmark/results and embeddings-benchmark/leaderboard be made after merging this PR?
  3. FiQA2018 is already in MTEB, so I have left that out from FinMTEB. Otherwise, there were no conflicting tasks.
  4. Some tasks don't have a reference URL.
  5. The Summarization tasks are still pending. I have yet to look into the changes highlighted by @yixuantt in Add FinMTEB #1267 for summarization.

I'll add the summarization changes and make the PRs to results and leaderboard once this is done.
Is there anything else I'm missing out on?

@isaac-chung
Copy link
Collaborator

Hi @alt-glitch , thanks for working on this!

  1. Yes, I'd suggest running the whole thing on a small model mentioned in the paper like all-MiniLM-L12-v2, and only using the quickest settings as a sanity check, e.g. n_experiments=1 for classification.
  2. Afterwards for the leaderboard yes. I'll leave the results repo part to @KennethEnevoldsen
  3. Sounds good.
  4. I think it's ok to use the paper's URL or its GitHub URL as reference. Otherwise, there are individual references for each dataset mentioned in the paper.
  5. Re: summarization task, we can add column names as a class attributes to AbstaskSummarization like the way we did in MIEB's AbsTaskImageClassification.

Let me know if anything is unclear.

@KennethEnevoldsen
Copy link
Contributor

Re. 2: PRs to embeddings-benchmark/results can be made after this PR. I don't believe a PR to embeddings-benchmark/leaderboard will be required once the new leaderboard is up and running as long as the benchmark is added to it is added to benchmarks.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants