Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama-3-8B #75

Open
hsb1995 opened this issue Apr 22, 2024 · 5 comments
Open

Llama-3-8B #75

hsb1995 opened this issue Apr 22, 2024 · 5 comments

Comments

@hsb1995
Copy link

hsb1995 commented Apr 22, 2024

image

I have obtained the weight offset factor for llama3-8b, but there was a unique mismatch issue during my compression process.

image
My scaling factor code has not been changed, but there was a dimension issue when I started compressing.
The parameter settings are as follows:
--model
${}$Llama-3-8b/
--epochs
20
--output_dir
${}$llama-3-8b-w6a6/
--eval_ppl
--wbits
6
--abits
6
--lwc
--let
--net
Llama-3-8b
--tasks
arc_easy,arc_challenge,boolq,hellaswag,winogrande,piqa

@hsb1995
Copy link
Author

hsb1995 commented Apr 22, 2024

w=16,a=16
I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set(w=6,a=6), problems arise
image

@ChenMnZ
Copy link
Collaborator

ChenMnZ commented Apr 24, 2024

@hsb1995
LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

@hsb1995
Copy link
Author

hsb1995 commented Apr 24, 2024

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Professor, thank you for your full work. I really don't know how GQA is handled as you mentioned

Can I understand what you said that I kept the original "generateAct_scale.shift" file unchanged to obtain the "act_scales" and "act_shifts" files.
And then I will do our weight quantification for processing?
Parameter settings:
CUDA_VISIBLE_DEVICES=0 python main.py
--model /PATH/TO/LLaMA/llama-8b
--epochs 20 --output_dir ./log/llama-8b-w6a6
--eval_ppl --wbits 6 --abits 6 --lwc
Is the above operation possible?
I only deleted the let operation.

@hsb1995
Copy link
Author

hsb1995 commented Apr 24, 2024

Hey, professor. I gave it a try.
It's really difficult to change. The current errors are as follows. What should I do when encountering these?

[2024-04-24 17:14:17 root](omniquant.py 50): INFO Starting ...
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /home/sam/Doctorproject/weight/llama-3-8b/LLM-Research/Llama-3-8b/ and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 419, in
main()
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 383, in main
omniquant(
File "/home/sam/Doctorproject/OmniQuant-main/quantize/omniquant.py", line 102, in omniquant
raise ValueError("Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now")
ValueError: Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now

@kimoji919
Copy link

@ChenMnZ hello,I also meet some problems like this.
I've tried your code in runing_falcon180b_on_single_a100_80g.ipynb with llama2-7b. Do quant and do save with real quant.However,while Loading pre-computed quantized weights,it returns warning like this,
image
and fail while exec code model = model.cuda().
bug like this
image
I also try your weight in huggingface,but seems it does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants