Fix recompilations in compile due to enabled_flags #761

afierka-intel · 2025-01-30T13:06:38Z

Update vllm-hpu-extension with fix for the entitled recompilations: HabanaAI/vllm-hpu-extension#88

mgawarkiewicz · 2025-01-31T08:59:51Z

requirements-hpu.txt

@@ -8,4 +8,4 @@ pandas
 tabulate
 setuptools>=61
 setuptools-scm>=8
-vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@d4f37bb
+vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@dev/afierka/fix-compile-recompilations


Proper sha needed instead of a dev branch.

This PR enables loading AWQ quantized models and running weight-only quantized inference on HPU. Currently, it works only for BF16 inference due to kernel torch.ops.hpu.convert_from_uint4 not supporting FP16. Tested on TheBloke/Llama-2-70B-Chat-AWQ and worked. --------- Co-authored-by: Michał Kuligowski <[email protected]>

Dummy parameter initialization (load_format='dummy') is not working on hpu due to torch.generator not being supported. This PR fixes the issue by bypassing the generator. --------- Co-authored-by: Michał Kuligowski <[email protected]>

Fix t.compile recompilations caused by flags

ad61491

afierka-intel force-pushed the dev/afierka/fix-compile-recompilations branch from f10c910 to ad61491 Compare January 30, 2025 20:21

afierka-intel added 2 commits January 30, 2025 22:35

Tiny refactor

2422e9d

Much prettier fix

7b7de00

afierka-intel marked this pull request as ready for review January 30, 2025 21:24

afierka-intel requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz and vivekgoe as code owners January 30, 2025 21:24

mgawarkiewicz requested changes Jan 31, 2025

View reviewed changes

maktukmak and others added 4 commits January 31, 2025 15:55

Update requirements-hpu.txt (#756)

ea33e29

Generator bypass for dummy init (#747)

f991c0a

Dummy parameter initialization (load_format='dummy') is not working on hpu due to torch.generator not being supported. This PR fixes the issue by bypassing the generator. --------- Co-authored-by: Michał Kuligowski <[email protected]>

Fix t.compile recompilations caused by flags

6f5576b

afierka-intel closed this Jan 31, 2025

afierka-intel deleted the dev/afierka/fix-compile-recompilations branch January 31, 2025 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix recompilations in compile due to enabled_flags #761

Fix recompilations in compile due to enabled_flags #761

afierka-intel commented Jan 30, 2025 •

edited

Loading

mgawarkiewicz Jan 31, 2025

Fix recompilations in compile due to enabled_flags #761

Fix recompilations in compile due to enabled_flags #761

Conversation

afierka-intel commented Jan 30, 2025 • edited Loading

mgawarkiewicz Jan 31, 2025

Choose a reason for hiding this comment

afierka-intel commented Jan 30, 2025 •

edited

Loading