[Help Wanted] the alignment with official accuracy in llama3.2-vision #493

droidXrobot · 2024-09-29T00:43:54Z

No description provided.

shan23chen · 2024-09-29T20:56:24Z

Does the repo support this model yet? Thanks!

FangXinyu-0913 · 2024-10-03T16:44:46Z

Hi @droidXrobot @shan23chen! This repo now supports Llama-3.2-11B/90B-Vision-Instruct, you can use it with the newest transformers version (>=4.45.0.dev0)!
However, the evaluation results obtained based on the current repo do not match the official results, and after the hyperparameters and the system prompt are aligned, there is still the problem of more dropped accuracy (mainly for ai2d). Is there anyone willing to solve this problem?

Ref:
https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/blob/main/generation_config.json
https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/eval_details.md

luohao123 · 2024-10-09T06:19:04Z

Actually, all my benchmark can not align same model as before...

THIS REPO UPDATE TOO QUICK... Many things might casued not alignment..

kennymckormick · 2024-10-09T06:22:38Z

Actually, all my benchmark can not align same model as before...

THIS REPO UPDATE TOO QUICK... Many things might casued not alignment..

Would you please provide more information, such as the corresponding commit ID of the previous and current code you used for evaluation, as well as the model & benchmark you have evaluated?

luohao123 · 2024-10-09T07:09:10Z

As for user, we can not compare each commit to see what changed, it's your responsibility.

The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:

The tsv file new generated;
This operation not have before, I donkt now what is this:

and it is slow
the metric now all lower, on all benchmarks, same model
I dont know what changed inside the evalkit.

I even doubt is my training codebase got wrong, stuck me about 1 week,

afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.

Any suggestion?

kennymckormick · 2024-10-09T07:33:46Z

As for user, we can not compare each commit to see what changed, it's your responsibility.

The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:

The tsv file new generated;

This operation not have before, I donkt now what is this:

and it is slow

the metric now all lower, on all benchmarks, same model

I dont know what changed inside the evalkit.

I even doubt is my training codebase got wrong, stuck me about 1 week,

afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.

Any suggestion?

At least, you need to provide some information so that we can help. Please tell me the model you are using, one / several benchmarks you are evaluating. If you cannot find out the initial commit you are using, please try to remember when you first use this codebase.

kennymckormick · 2024-10-10T06:22:53Z

As for user, we can not compare each commit to see what changed, it's your responsibility.

The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:

The tsv file new generated;

This operation not have before, I donkt now what is this:

and it is slow

the metric now all lower, on all benchmarks, same model

I dont know what changed inside the evalkit.

I even doubt is my training codebase got wrong, stuck me about 1 week,

afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.

Any suggestion?

Same Here: #503 (comment)

Also, if you want to go further with this problem, maybe creating a new issue is a better idea. You problem is not related to the issue of llama-3.2.

terry-for-github · 2024-10-16T05:57:17Z

As for user, we can not compare each commit to see what changed, it's your responsibility.
The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:

The tsv file new generated;

This operation not have before, I donkt now what is this:

and it is slow

the metric now all lower, on all benchmarks, same model

I dont know what changed inside the evalkit.

I even doubt is my training codebase got wrong, stuck me about 1 week,
afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.
Any suggestion?

Same Here: #503 (comment)

Also, if you want to go further with this problem, maybe creating a new issue is a better idea. You problem is not related to the issue of llama-3.2.

@kennymckormick Same issue here too. #523 Could you please check this one? Thanks!

FangXinyu-0913 added the help wanted Extra attention is needed label Oct 3, 2024

FangXinyu-0913 changed the title ~~Someone please add Llama 3.2 11b to the leaderboard~~ [Help Wanted] the alignment with official accuracy in llama3.2-vision Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help Wanted] the alignment with official accuracy in llama3.2-vision #493

[Help Wanted] the alignment with official accuracy in llama3.2-vision #493

droidXrobot commented Sep 29, 2024

shan23chen commented Sep 29, 2024

FangXinyu-0913 commented Oct 3, 2024

luohao123 commented Oct 9, 2024

kennymckormick commented Oct 9, 2024

luohao123 commented Oct 9, 2024 •

edited

Loading

kennymckormick commented Oct 9, 2024

kennymckormick commented Oct 10, 2024

terry-for-github commented Oct 16, 2024

[Help Wanted] the alignment with official accuracy in llama3.2-vision #493

[Help Wanted] the alignment with official accuracy in llama3.2-vision #493

Comments

droidXrobot commented Sep 29, 2024

shan23chen commented Sep 29, 2024

FangXinyu-0913 commented Oct 3, 2024

luohao123 commented Oct 9, 2024

kennymckormick commented Oct 9, 2024

luohao123 commented Oct 9, 2024 • edited Loading

kennymckormick commented Oct 9, 2024

kennymckormick commented Oct 10, 2024

terry-for-github commented Oct 16, 2024

luohao123 commented Oct 9, 2024 •

edited

Loading