1.19.0 fast-forward merge #542

kzawora-intel · 2024-11-25T12:04:47Z

No description provided.

Random sampler warmup

This change allows to skip empty steps in multistep scenario. We are currently wasting host time on launching n-2 empty steps. This PR removes it. The gain will be visible after device time optimizations, as we are currently limited by HPU calculations inside multistep.

…ntext (#530) Co-authored-by: Konrad Zawora <[email protected]>

…on (#534) This PR is a follow-up to [https://github.com/HabanaAI/vllm-hpu-extension/pull/40](https://github.com/HabanaAI/vllm-hpu-extension/pull/40). It removes all bucketing logic that was moved to vllm-hpu-extension

Limit decode bucket size to num_hpu_blocks

Fixes issue with multi LoRA during `profile_run`.

We are seeing 10% performance regression in the llama-based model due to vllm-project#10239. The mark_step() function needs to be configured differently for each model to achieve the best performance. For some models, mark_step() for every decoder step would be optimal, but for other models, it's better to run it every n-th step. We are adding a counter to only register the hook for every n-th step, which can be configured with VLLM_CONFIG_HIDDEN_LAYERS

mfylcek and others added 22 commits November 15, 2024 14:56

Warm up random sampler

0467cc1

Warmup random sampler only during decoding

82e0521

Remove comment

0014d34

Remove comments

e0e37e0

Formatting

0175fe0

Move the warmup to graph capture function

b38b160

Bug fix

76aa48a

Formatting

e24a5af

Set vllm-hpu-extension to a69bb99 (#521)

2f43ebf

Update ray_hpu_executor.py (#522)

8c3f56a

Random sampler warmup (#506)

6338608

Random sampler warmup

[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingCo…

f481707

…ntext (#530) Co-authored-by: Konrad Zawora <[email protected]>

[SW-201504] Adding Test Trigger (#533)

425d0be

[SW-201504] Add Jenkins Tests Trigger (#537)

0d153cf

Limit decode block size (#532)

39c6b6c

Limit decode bucket size to num_hpu_blocks

fix marlin flag set on hpu (#540)

5eb8b1f

Fix profile run for multi LoRA (#549)

0f513bd

Fixes issue with multi LoRA during `profile_run`.

fix cutlass_fp8_supported flag set on hpu

7133502

Fix cutlass_fp8_supported flag set on HPU (#550)

38c2d10

michalkuligowski mentioned this pull request Nov 26, 2024

[HPU] Add mark_step configurable for the decoder layer #548

Closed

michalkuligowski added 2 commits November 26, 2024 11:25

Update cpu-test.yml (#544)

633df59

Update *.sh (#545)

4d8185f

michalkuligowski mentioned this pull request Nov 26, 2024

Revert "Fix flags setting on HPU for FP8LinearMethod" #553

Merged

michalkuligowski added 2 commits November 26, 2024 13:33

Update run-lm-eval-gsm-vllm-baseline.sh (#552)

3f0b0e4

Add HPU information to collect_env script (#430)

b099337

michalkuligowski merged commit 79e37ad into v1.19.0 Nov 26, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.19.0 fast-forward merge #542

1.19.0 fast-forward merge #542

kzawora-intel commented Nov 25, 2024

1.19.0 fast-forward merge #542

1.19.0 fast-forward merge #542

Conversation

kzawora-intel commented Nov 25, 2024