Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on MMLU #13

Open
allanj opened this issue Apr 8, 2023 · 6 comments
Open

Performance on MMLU #13

allanj opened this issue Apr 8, 2023 · 6 comments

Comments

@allanj
Copy link

allanj commented Apr 8, 2023

Have you evaluated the performance on MMLU compared to original?

@chiayewken
Copy link
Collaborator

Hi, we have noticed slightly lower MMLU scores for declare-lab/flan-alpaca-xl compared to google/flan-t5-xl. This may be due to the zero-shot format of the Alpaca data compared to few-shot for MMLU, and we are benchmarking multiple models here:

https://github.com/declare-lab/flan-eval

@allanj
Copy link
Author

allanj commented Apr 9, 2023

Thanks a lot for the efforts

@allanj allanj closed this as completed Apr 9, 2023
@allanj
Copy link
Author

allanj commented Apr 10, 2023

Are they evaluated using CoT prompting?

@allanj allanj reopened this Apr 10, 2023
@chiayewken
Copy link
Collaborator

Hi, the evaluation is using direct prompting for MMLU

@allanj
Copy link
Author

allanj commented Apr 12, 2023

Btw, It seems you are doing few-shot prompting, am I right?

https://github.com/declare-lab/flan-eval/blob/47910e01894420971665eade0c92e3046b8f1f3d/mmlu.py#L142-L151

@chiayewken
Copy link
Collaborator

Yes, we used 5-shot prompting for MMLU based on the Flan-T5 paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants