Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test llama-2-7b with GPTQ quant. scheme #141

Merged
merged 9 commits into from
Mar 21, 2024

Conversation

lopozz
Copy link
Contributor

@lopozz lopozz commented Feb 29, 2024

test TheBloke/Llama-2-7B-GPTQ with pytorch backend and cuda hardware. #95

@lopozz lopozz closed this Feb 29, 2024
@lopozz lopozz reopened this Feb 29, 2024
@IlyasMoutawwakil
Copy link
Member

Thanks a lot for the PR ! I have few suggestions:

  • let's use TheBloke/TinyLlama-1.1B-Chat-v0.3-GPTQ to reduce download time like in trt-llm (this is just the smallest llama I can think of)
  • let's use the compose api, ie inherit stuff from other configs instead of rewriting them like here.

@lopozz
Copy link
Contributor Author

lopozz commented Mar 1, 2024

Hi thanks for the feedback. Should I create a default file for llama like it was done for gpt or timm? Also I am currently working with a machine with one GPU. hence, the backend.device_ids: 0

@IlyasMoutawwakil
Copy link
Member

no need, just reuse the ones already provided as much as possible and explicit the rest:

defaults:
  - backend: pytorch # default backend
  # order of inheritance, last one overrides previous ones
  - _base_ # inherits from base config
  - _inference_ # inherits from inference config
  - _cuda_ # inherits from cuda config
  - _self_ # hydra 1.1 compatibility

experiment_name: cuda_inference_pytorch_gptq

backend:
  model: TheBloke/TinyLlama-1.1B-Chat-v0.3-GPTQ
  quantization_config:
    exllama_config:
      version: 2

hydra:
  sweeper:
    params:
      backend.no_weights: true,false

@lopozz
Copy link
Contributor Author

lopozz commented Mar 6, 2024

Hi @IlyasMoutawwakil I just modified the file with your feedback. Let me know if I can do something else

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Mar 8, 2024

@lopozz thanks, can you also add the gptq pip installation to the test workflow, you will have to use the index urls from https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation

@lopozz
Copy link
Contributor Author

lopozz commented Mar 13, 2024

Thanks for the support @IlyasMoutawwakil, it is my first pull request, but I think I solved the issue. Also, seeing the comment #144 (comment) I did not modify the setup.py

Main changes: pip installation of optimum and auto-gptq packages

Let me know if you have further feedback.

@IlyasMoutawwakil
Copy link
Member

@lopozz thanks a lot for working on this.
optimum and auto-gptq are still missing from setup.py's extras.
I would suggest not adding optimum as a standalone but rather grouping both deps in auto-gptq

    "bitsandbytes": ["bitsandbytes"],
    "auto-gptq": ["optimum", "auto-gptq"],

This will probably not work with ROCm 5.6 & 5.7 and CUDA 11.8 😅 as for these systems an extra url is required to download the right wheels, as explained in: https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#installation
It can be solved using dependency_links as explained in https://stackoverflow.com/a/30064248, but tell me if this is too complicated for the scope of your PR.

@lopozz
Copy link
Contributor Author

lopozz commented Mar 19, 2024

@IlyasMoutawwakil
update of the last commit:

  • added 3 new identifiers in setup.py for cuda121, cuda118 and rocm to install optimum and auto-gptq according to the corresponding index urls.

I did not used dependency_links as you suggested because deprecated starting with pip version 19.0 (released 2019-01-22) https://setuptools.pypa.io/en/latest/deprecated/dependency_links.html#specifying-dependencies-that-aren-t-in-pypi-via-dependency-links. Let me know if you agree, in case I can modify it.

setup.py Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
.github/workflows/test_cli_rocm_pytorch.yaml Outdated Show resolved Hide resolved
@IlyasMoutawwakil
Copy link
Member

Thanks, I left some comments, don't forget to run styling.
There appear to be a SIGSEGV when testing GPTQ, probably something wrong with the cleanup code, I'll investigate the cause of that.

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Mar 20, 2024

okay found why, in the code I use exllama_version but it's supposed to be just version. So g_idx is not created in no weights model, resulting in a SIGSEGV. I will fix this in another PR and merge it quickly.
fixed in #165

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Mar 21, 2024

Thanks for the addition @lopozz great work on your first PR 🤗

@IlyasMoutawwakil IlyasMoutawwakil merged commit 38b89e7 into huggingface:main Mar 21, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants