Remote push refactor #297

robertgshaw2-neuralmagic · 2024-06-11T00:55:09Z

SUMMARY:

updated model test structure to focus on core models
refactored tests to use environment variables (currently at "test group" level - so each folder has an env variable). All tests are off by default and they are explicitly enabled
refactored workflows build-test workflow to use a list of env variables rather than skip test list

WHY:

this enables us to be more sane about what is and is not on - as opposed to a long list of files
this enables us to actually track what is run and what is not run (via testmo, which tracks skipped tests)
this enables us to have more fine-grained control over what is run vs not run (we can add more env vars at the sub-group level to turn off more tests)

Co-authored-by: Alexey Kondratiev <[email protected]>

Allow dummy load format for fp8, torch.uniform_ doesn't support FP8 at the moment Co-authored-by: Mor Zusman <[email protected]>

…project#4920)

Signed-off-by: kerthcet <[email protected]>

…llm-project#4944)

…llm-project#4722)

…#4977)

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

…ct#4893) The 2nd PR for vllm-project#4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

…llm-project#4894)

…Config (vllm-project#4991)

…e) (vllm-project#4983)

…ot defined (vllm-project#5009)

Signed-off-by: Muralidhar Andoorveedu <[email protected]>

Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

…project#4985) Co-authored-by: Elisei Smirnov <[email protected]>

robertgshaw2-neuralmagic · 2024-06-11T00:59:59Z

tests/conftest.py

@@ -147,7 +147,7 @@ def __init__(
        self,
        model_name: str,
        dtype: str = "half",
-        access_token: Optional[str] = None,
+        **kwargs,


access token not needed (we set HF_TOKEN in automation)

kwargs enables us to pass whatever we want for hf runner

robertgshaw2-neuralmagic · 2024-06-11T01:01:31Z

tests/conftest.py

@@ -472,21 +472,21 @@ def _decode_token_by_position_index(

    def generate_greedy_logprobs_nm_use_tokens(
        self,
-        prompts: List[str],
+        input_ids_lst: List[torch.Tensor],


previously, we were passing a prompt, formatted with a chat template (which appends bos token)

then, we tokenize here which appends another bos token

This change prevents that double permanently, by forcing the test to tokenize everything fully

…/nm-vllm into remote-push-refactor

dhuangnm · 2024-06-11T02:18:53Z

.github/workflows/nm-remote-push.yml

@@ -37,7 +37,7 @@ jobs:
            test_label_solo: gcp-k8s-l4-solo
            test_label_multi: ignore
            test_timeout: 480
-            test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
+            test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt


For remote-push, maybe we don't need to run all 4 python versions, and only 38/311 is good enough?

yeah, let's get to that after this is merged

tests/nm_utils/utils_skip.py

andy-neuma

i'd prefer not having a file as input. instead, it'd be more flexible to just have a string listing the ENV with setting. something like,

TEST_ACCURACY=DISABLE,TEST_CORE=ENABLE

andy-neuma · 2024-06-13T19:32:13Z

.github/workflows/nm-remote-push.yml

@@ -37,7 +37,7 @@ jobs:
            test_label_solo: gcp-k8s-l4-solo
            test_label_multi: ignore
            test_timeout: 480
-            test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
+            test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt


yeah, let's get to that after this is merged

andy-neuma

thanks! we can adjust calling parameters and style later.

SUMMARY: * updated model test structure to focus on core models * refactored tests to use environment variables (currently at "test group" level - so each folder has an env variable). All tests are off by default and they are explicitly enabled * refactored workflows build-test workflow to use a list of env variables rather than skip test list WHY: * this enables us to be more sane about what is and is not on - as opposed to a long list of files * this enables us to actually track what is run and what is not run (via testmo, which tracks skipped tests) * this enables us to have more fine-grained control over what is run vs not run (we can add more env vars at the sub-group level to turn off more tests) --------- Signed-off-by: kerthcet <[email protected]> Signed-off-by: Muralidhar Andoorveedu <[email protected]> Signed-off-by: pandyamarut <[email protected]> Co-authored-by: Alexander Matveev <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Wenwei Zhang <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: Alexey Kondratiev <[email protected]> Co-authored-by: Mor Zusman <[email protected]> Co-authored-by: Mor Zusman <[email protected]> Co-authored-by: Aurick Qiao <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Antoni Baum <[email protected]> Co-authored-by: HUANG Fei <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kante Yin <[email protected]> Co-authored-by: sasha0552 <[email protected]> Co-authored-by: SangBin Cho <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: raywanb <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Philipp Moritz <[email protected]> Co-authored-by: Letian Li <[email protected]> Co-authored-by: Murali Andoorveedu <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Elisei Smirnov <[email protected]> Co-authored-by: Elisei Smirnov <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: leiwen83 <[email protected]> Co-authored-by: Lei Wen <[email protected]> Co-authored-by: Eric Xihui Lin <[email protected]> Co-authored-by: beagleski <[email protected]> Co-authored-by: bapatra <[email protected]> Co-authored-by: Barun Patra <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Zhuohan Li <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Michał Moskal <[email protected]> Co-authored-by: Ruth Evans <[email protected]> Co-authored-by: Divakar Verma <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Junichi Sato <[email protected]> Co-authored-by: Marut Pandya <[email protected]> Co-authored-by: afeldman-nm <[email protected]> Co-authored-by: Ronen Schaffer <[email protected]> Co-authored-by: Itay Etelis <[email protected]> Co-authored-by: omkar kakarparthi <[email protected]> Co-authored-by: Alexei V. Ivanov <[email protected]> Co-authored-by: Breno Faria <[email protected]> Co-authored-by: Breno Faria <[email protected]> Co-authored-by: Hyunsung Lee <[email protected]> Co-authored-by: Chansung Park <[email protected]> Co-authored-by: SnowDist <[email protected]> Co-authored-by: functionxu123 <[email protected]> Co-authored-by: xuhao <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]>

alexm-neuralmagic and others added 30 commits June 8, 2024 16:39

[Kernel] Add marlin_24 unit tests (vllm-project#4901)

e69d23b

[Kernel] Add flash-attn back (vllm-project#4907)

81ec16b

[Model] LLaVA model refactor (vllm-project#4910)

5500975

Remove marlin warning (vllm-project#4918)

b913d04

[Misc]: allow user to specify port in distributed setting (vllm-proje…

683a30b

…ct#4914)

[Build/CI] Enabling AMD Entrypoints Test (vllm-project#4834)

c8794c3

Co-authored-by: Alexey Kondratiev <[email protected]>

[Bugfix] Fix dummy weight for fp8 (vllm-project#4916)

5b6a7b5

Allow dummy load format for fp8, torch.uniform_ doesn't support FP8 at the moment Co-authored-by: Mor Zusman <[email protected]>

[Core] Sharded State Loader download from HF (vllm-project#4889)

a5e66c7

[Doc]Add documentation to benchmarking script when running TGI (vllm-…

8a78ed8

…project#4920)

[Core] Fix scheduler considering "no LoRA" as "LoRA" (vllm-project#4897)

6b46dcf

[Model] add rope_scaling support for qwen2 (vllm-project#4930)

907d48a

[Model] Add Phi-2 LoRA support (vllm-project#4886)

11d6f7e

[Docs] Add acknowledgment for sponsors (vllm-project#4925)

5d98989

[CI/Build] Codespell ignore build/ directory (vllm-project#4945)

58a235b

[Bugfix] Fix flag name for max_seq_len_to_capture (vllm-project#4935)

253d8fb

Signed-off-by: kerthcet <[email protected]>

[Bugfix][Kernel] Add head size check for attention backend selection (v…

f744125

…llm-project#4944)

[Frontend] Dynamic RoPE scaling (vllm-project#4638)

c1672a9

[CI/Build] Enforce style for C++ and CUDA code with clang-format (v…

4b6c961

…llm-project#4722)

[misc] remove comments that were supposed to be removed (vllm-project…

4b74974

…#4977)

[Kernel] Fixup for CUTLASS kernels in CUDA graphs (vllm-project#4954)

39c15ee

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

[Misc] Load FP8 kv-cache scaling factors from checkpoints (vllm-proje…

2835fc6

…ct#4893) The 2nd PR for vllm-project#4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

[Model] LoRA gptbigcode implementation (vllm-project#3949)

3db99a6

[Core] Eliminate parallel worker per-step task scheduling overhead (v…

39a0a40

…llm-project#4894)

[Minor] Fix small typo in llama.py: QKVParallelLinear -> Quantization…

847ca88

…Config (vllm-project#4991)

[Misc] Take user preference in attention selector (vllm-project#4960)

c60384c

Marlin 24 prefill performance improvement (about 25% better on averag…

dae5aaf

…e) (vllm-project#4983)

[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is n…

05a4f64

…ot defined (vllm-project#5009)

[Core][1/N] Support send/recv in PyNCCL Groups (vllm-project#4988)

bf4c411

Signed-off-by: Muralidhar Andoorveedu <[email protected]>

[Kernel] Initial Activation Quantization Support (vllm-project#4525)

c623663

Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

[Core]: Option To Use Prompt Token Ids Inside Logits Processor (vllm-…

a9ca32d

…project#4985) Co-authored-by: Elisei Smirnov <[email protected]>

robertgshaw2-neuralmagic added 12 commits June 10, 2024 12:17

skip samplers during remote push

389bdcd

cleanup newline nit

5dd3f5d

switch to enable / disable

a475844

readded

397cfe2

convert workflows to use new files

8c6d1f3

updated each comment

e093e61

updated missed core files

e95ad95

updated test core

fe0be9e

format

4fabe98

Merge branch 'main' into remote-push-refactor

ae39285

fix bad merge llm_generate

14dedf1

fix bad merge oot_registration

4b078bd

robertgshaw2-neuralmagic commented Jun 11, 2024

View reviewed changes

duplicate mark

05c5702

robertgshaw2-neuralmagic requested review from andy-neuma and dhuangnm June 11, 2024 01:09

Merge branch 'main' into remote-push-refactor

e8166df

robertgshaw2-neuralmagic mentioned this pull request Jun 11, 2024

[Rel Eng] Dial In Test Skipping #293

Closed

robertgshaw2-neuralmagic added 2 commits June 11, 2024 01:14

Merge branch 'remote-push-refactor' of https://github.com/neuralmagic…

08c8e55

…/nm-vllm into remote-push-refactor

yapf on models core

62f6283

dhuangnm reviewed Jun 11, 2024

View reviewed changes

dbarbuzzi added 3 commits June 13, 2024 18:49

Replace '0' with 'ENABLE'

9b2d02f

Merge branch 'main' into remote-push-refactor

4b691b9

Small fixes from conflict resolution

ef38251

andy-neuma reviewed Jun 13, 2024

View reviewed changes

andy-neuma approved these changes Jun 13, 2024

View reviewed changes

dbarbuzzi merged commit a3cc7a8 into main Jun 14, 2024
30 of 34 checks passed

dbarbuzzi deleted the remote-push-refactor branch June 14, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote push refactor #297

Remote push refactor #297

robertgshaw2-neuralmagic commented Jun 11, 2024

robertgshaw2-neuralmagic Jun 11, 2024 •

edited

Loading

robertgshaw2-neuralmagic Jun 11, 2024

dhuangnm Jun 11, 2024

andy-neuma Jun 13, 2024

andy-neuma left a comment

andy-neuma Jun 13, 2024

andy-neuma left a comment

Remote push refactor #297

Remote push refactor #297

Conversation

robertgshaw2-neuralmagic commented Jun 11, 2024

robertgshaw2-neuralmagic Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jun 11, 2024

Choose a reason for hiding this comment

dhuangnm Jun 11, 2024

Choose a reason for hiding this comment

andy-neuma Jun 13, 2024

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

andy-neuma Jun 13, 2024

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jun 11, 2024 •

edited

Loading