Questions about the accuracy of eight commonsense reasoning datasets vs the Llama paper #70

Yonghao-Tan · 2024-09-15T13:20:39Z

Hi, thanks for the useful code for us! I have questions about the accuracy of commonsense reasoning tasks. In the readme, the accuracy of Llama (for example) is

While the Llama2 paper is

Some tasks have lower accuracy after fine-tuning, like 76.5 -> 68.9 for BoolQ. Could you kindly explain this to me? Thanks a lot!

zjtco-yr · 2024-09-26T04:08:31Z

same question

YananLi18 · 2024-10-18T04:40:45Z

Given the MMLU performance referenced in the Llama2 paper, I believe the results in Table 20 reflect a 5-shot scenario, while LLM-Adapters' performance is primarily zero-shot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the accuracy of eight commonsense reasoning datasets vs the Llama paper #70

Questions about the accuracy of eight commonsense reasoning datasets vs the Llama paper #70

Yonghao-Tan commented Sep 15, 2024

zjtco-yr commented Sep 26, 2024

YananLi18 commented Oct 18, 2024

Questions about the accuracy of eight commonsense reasoning datasets vs the Llama paper #70

Questions about the accuracy of eight commonsense reasoning datasets vs the Llama paper #70

Comments

Yonghao-Tan commented Sep 15, 2024

zjtco-yr commented Sep 26, 2024

YananLi18 commented Oct 18, 2024