Prompt versions of non-instruction-tuned LLaMA models #89

ikuyamada · 2023-09-04T07:51:52Z

It appears that the leaderboard results for non-instruction-tuned LLaMA models (e.g., meta-llama/Llama-2-7b-hf) in the jp-stable branch are measured using prompt version 0.3. However, according to the documentation, this prompt was designed for instruction-tuned models.

Should we consider using version 0.1 or 0.2 for these non-instruction-tuned models instead?

The text was updated successfully, but these errors were encountered:

mkshing · 2023-10-11T00:18:57Z

@ikuyamada yes, you're correct. The "base" models should be evaluated by 0.1 or 0.2. And, we have already noticed this mistake in https://github.com/Stability-AI/lm-evaluation-harness/blob/jp-stable/models/llama2/llama2-7b/harness.sh#L2 and will update it soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt versions of non-instruction-tuned LLaMA models #89

Prompt versions of non-instruction-tuned LLaMA models #89

ikuyamada commented Sep 4, 2023 •

edited

Loading

mkshing commented Oct 11, 2023

Prompt versions of non-instruction-tuned LLaMA models #89

Prompt versions of non-instruction-tuned LLaMA models #89

Comments

ikuyamada commented Sep 4, 2023 • edited Loading

mkshing commented Oct 11, 2023

ikuyamada commented Sep 4, 2023 •

edited

Loading