Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename PawsX to PawsXPairClassification #2

Merged
merged 2 commits into from
Jul 9, 2024
Merged

Conversation

KennethEnevoldsen
Copy link
Contributor

Since some file system are case-insensitive I here fix the naming of the files here.

Required for embeddings-benchmark/mteb#1038

Copy link
Contributor

@orionw orionw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Out of curiosity, how did the names not end up the same as in MTEB? EDIT: read the other issue, I understand now :)

@KennethEnevoldsen KennethEnevoldsen merged commit 0c18f0a into main Jul 9, 2024
1 check passed
@KennethEnevoldsen KennethEnevoldsen deleted the rename_pawsx branch July 9, 2024 17:07
@Muennighoff
Copy link
Contributor

Nice! I think we also have to update https://github.com/embeddings-benchmark/leaderboard/blob/c83ff74f9f40a9cc85bda773cdfb639b0f4d82aa/config.yaml#L233 (but in a backward compatible way such that the old name is still okay too, e.g. like here for MLSumClustering where the name also changed: https://github.com/embeddings-benchmark/leaderboard/blob/c83ff74f9f40a9cc85bda773cdfb639b0f4d82aa/refresh.py#L213)

@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Jul 9, 2024

Like so: embeddings-benchmark/leaderboard@9c65fce ?

(sorry I realised this should have been a PR and not a push to main - maybe we should enable no pushes to main just so that I don't make that mistake again...)

@Muennighoff
Copy link
Contributor

@Muennighoff
Copy link
Contributor

Oh and I think it should be the other way round i.e.

    if ('PawsXPairClassification (fr)' in datasets):
        datasets.append('PawsX (fr)')

MLSUMClusteringP2P (fr) is the new name while MLSUMClusteringP2P is the legacy name

@KennethEnevoldsen
Copy link
Contributor Author

Added a fix here: embeddings-benchmark/leaderboard#5

@orionw
Copy link
Contributor

orionw commented Jul 10, 2024

I think we may need to update this on the leaderboard as well? The CI failed due to KeyError: 'PawsXPairClassification (fr)' probably from the French benchmark?

EDIT: were there other tasks as well we need to update for the leaderboard or just this one @KennethEnevoldsen?

@KennethEnevoldsen
Copy link
Contributor Author

Just this one

@KennethEnevoldsen
Copy link
Contributor Author

Hmm maybe we should add a test such that embeddings-benchmark/leaderboard#5 would have failed?

@orionw
Copy link
Contributor

orionw commented Jul 10, 2024

If we want we could even just add the same task that runs overnight, the refresh. It takes about 10 minutes though, is that too long? I do agree we should add some test.

@KennethEnevoldsen
Copy link
Contributor Author

10 minutes is ok, though it would be nice if it was faster (since it makes debugging harder). Anything in the process that can be skipped (or potentially we can downsample in some way for the test)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants