Performance on MMLU #13

allanj · 2023-04-08T10:48:17Z

Have you evaluated the performance on MMLU compared to original?

chiayewken · 2023-04-08T17:02:40Z

Hi, we have noticed slightly lower MMLU scores for declare-lab/flan-alpaca-xl compared to google/flan-t5-xl. This may be due to the zero-shot format of the Alpaca data compared to few-shot for MMLU, and we are benchmarking multiple models here:

https://github.com/declare-lab/flan-eval

allanj · 2023-04-09T07:29:28Z

Thanks a lot for the efforts

allanj · 2023-04-10T14:58:52Z

Are they evaluated using CoT prompting?

chiayewken · 2023-04-10T18:00:09Z

Hi, the evaluation is using direct prompting for MMLU

allanj · 2023-04-12T14:33:50Z

Btw, It seems you are doing few-shot prompting, am I right?

https://github.com/declare-lab/flan-eval/blob/47910e01894420971665eade0c92e3046b8f1f3d/mmlu.py#L142-L151

chiayewken · 2023-04-12T15:04:40Z

Yes, we used 5-shot prompting for MMLU based on the Flan-T5 paper

allanj closed this as completed Apr 9, 2023

allanj reopened this Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on MMLU #13

Performance on MMLU #13

allanj commented Apr 8, 2023

chiayewken commented Apr 8, 2023

allanj commented Apr 9, 2023

allanj commented Apr 10, 2023

chiayewken commented Apr 10, 2023

allanj commented Apr 12, 2023

chiayewken commented Apr 12, 2023

Performance on MMLU #13

Performance on MMLU #13

Comments

allanj commented Apr 8, 2023

chiayewken commented Apr 8, 2023

allanj commented Apr 9, 2023

allanj commented Apr 10, 2023

chiayewken commented Apr 10, 2023

allanj commented Apr 12, 2023

chiayewken commented Apr 12, 2023