Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot compile with mlc-llm #21

Open
0x1997 opened this issue Oct 10, 2023 · 2 comments
Open

Cannot compile with mlc-llm #21

0x1997 opened this issue Oct 10, 2023 · 2 comments

Comments

@0x1997
Copy link

0x1997 commented Oct 10, 2023

I quantized a custom fine-tuned llama2 70b model like this.

$ python main.py \
  --model /data/finetuned_llama2_70b  \
  --epochs 20 \
  --output_dir /data/finetuned_llama2_70b_output \
  --wbits 4 \
  --abits 16 \
  --group_size 128 \
  --lwc \
  --net Llama-2-70b

$ python main.py \
  --model /data/finetuned_llama2_70b \
  --epochs 0 \
  --output_dir /data/finetuned_llama2_70b_output2 \
  --save_dir /data/finetuned_llama2_70b_omniquant \
  --resume /data/finetuned_llama2_70b_output/omni_parameters.pth \
  --wbits 4 \
  --abits 16 \
  --group_size 128 \
  --lwc \
  --net Llama-2-70b

Then I updated mlc_llm/quantization/__init__.py like this

"w4a16g128asym": QuantizationScheme(
    name="w4a16g128asym",
    linear_weight=GroupQuantizationSpec(
        dtype="float16",
        mode="int4",
        sym=False,
        storage_nbit=16,
        group_size=128,
        transpose=False,
    ),
    embedding_table=None,
    final_fc_weight=None,
)

When I try to compile the model with mlc-llm,

$ python -m mlc_llm.build \
  --model /data/finetuned_llama2_70b_omniquant \
  --target cuda \
  --quantization w4a16g128asym \
  --artifact-path /data/finetuned_llama2_70b_omniquant_mlc \
  --use-cache 0

I got this error.

Start computing and quantizing weights... This may take a while.
Traceback (most recent call last):
  File "~/mlc-llm/mlc_llm/build.py", line 42, in main
    core.build_model_from_args(parsed_args)
  File "~/mlc-llm/mlc_llm/core.py", line 619, in build_model_from_args
    new_params = utils.convert_weights(param_manager, params, args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mlc-llm/mlc_llm/utils.py", line 258, in convert_weights
    vm["transform_params"]()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "~/mambaforge/envs/mlc/lib/python3.11/site-packages/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "~/mlc-llm/mlc_llm/relax_model/param_manager.py", line 558, in get_item
    for torch_binname in [
                         ^
  File "~/mlc-llm/mlc_llm/relax_model/param_manager.py", line 559, in <listcomp>
    self.torch_pname2binname[torch_pname] for torch_pname in torch_pnames
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'model.layers.0.self_attn.q_proj.weight'
@shifeiwen
Copy link

Same error. Is there any progress on this issue so far? @0x1997

@shifeiwen
Copy link

@ChenMnZ Do you have any progress or tips on this? It can be that I successfully loaded and the quant weight in mlc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants