Llama-3-8B #75

hsb1995 · 2024-04-22T14:39:54Z

I have obtained the weight offset factor for llama3-8b, but there was a unique mismatch issue during my compression process.

My scaling factor code has not been changed, but there was a dimension issue when I started compressing.
The parameter settings are as follows:
--model
${}$Llama-3-8b/
--epochs
20
--output_dir
${}$llama-3-8b-w6a6/
--eval_ppl
--wbits
6
--abits
6
--lwc
--let
--net
Llama-3-8b
--tasks
arc_easy,arc_challenge,boolq,hellaswag,winogrande,piqa

hsb1995 · 2024-04-22T14:47:19Z

w=16,a=16
I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set（w=6,a=6）, problems arise

ChenMnZ · 2024-04-24T01:48:32Z

@hsb1995
LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

hsb1995 · 2024-04-24T07:30:15Z

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Professor, thank you for your full work. I really don't know how GQA is handled as you mentioned

Can I understand what you said that I kept the original "generateAct_scale.shift" file unchanged to obtain the "act_scales" and "act_shifts" files.
And then I will do our weight quantification for processing?
Parameter settings:
CUDA_VISIBLE_DEVICES=0 python main.py
--model /PATH/TO/LLaMA/llama-8b
--epochs 20 --output_dir ./log/llama-8b-w6a6
--eval_ppl --wbits 6 --abits 6 --lwc
Is the above operation possible?
I only deleted the let operation.

hsb1995 · 2024-04-24T09:40:41Z

Hey, professor. I gave it a try.
It's really difficult to change. The current errors are as follows. What should I do when encountering these?

[2024-04-24 17:14:17 root](omniquant.py 50): INFO Starting ...
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /home/sam/Doctorproject/weight/llama-3-8b/LLM-Research/Llama-3-8b/ and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 419, in
main()
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 383, in main
omniquant(
File "/home/sam/Doctorproject/OmniQuant-main/quantize/omniquant.py", line 102, in omniquant
raise ValueError("Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now")
ValueError: Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now

kimoji919 · 2024-08-02T18:47:41Z

@ChenMnZ hello,I also meet some problems like this.
I've tried your code in runing_falcon180b_on_single_a100_80g.ipynb with llama2-7b. Do quant and do save with real quant.However,while Loading pre-computed quantized weights,it returns warning like this,

and fail while exec code model = model.cuda().
bug like this

I also try your weight in huggingface,but seems it does not work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3-8B #75

Llama-3-8B #75

hsb1995 commented Apr 22, 2024

hsb1995 commented Apr 22, 2024

ChenMnZ commented Apr 24, 2024

hsb1995 commented Apr 24, 2024

hsb1995 commented Apr 24, 2024

kimoji919 commented Aug 2, 2024

Llama-3-8B #75

Llama-3-8B #75

Comments

hsb1995 commented Apr 22, 2024

hsb1995 commented Apr 22, 2024

ChenMnZ commented Apr 24, 2024

hsb1995 commented Apr 24, 2024

hsb1995 commented Apr 24, 2024

kimoji919 commented Aug 2, 2024