-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom task (bac-fr) for evaluation of models in french #518
base: main
Are you sure you want to change the base?
Conversation
community_tasks/french_evals.py
Outdated
few_shots_split=None, | ||
few_shots_select="random_sampling", | ||
generation_size=1, | ||
metric=[], # To be defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we feel like the instructions constrain the answer enough, we could go for an exact match. We can also look at @hynky1999 's parser for math equations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is math bench ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Includes math questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but not math only, I'd say 1/3 to 1/2 are math questions?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say 2/3 math questions!
Tested with current state of |
Propose a specific task to run on the bac-fr, a dataset composed of questions extracted from the french BAC exam.
The header of the dataset is like this:
We need to clearly define the metrics that will be used, as well as how the prompt will formulate its requests (@clefourrier).