Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Remote push refactor #297

Merged
merged 159 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
e69d23b
[Kernel] Add marlin_24 unit tests (#4901)
alexm-neuralmagic May 19, 2024
81ec16b
[Kernel] Add flash-attn back (#4907)
WoosukKwon May 20, 2024
5500975
[Model] LLaVA model refactor (#4910)
DarkLight1337 May 20, 2024
b913d04
Remove marlin warning (#4918)
alexm-neuralmagic May 20, 2024
683a30b
[Misc]: allow user to specify port in distributed setting (#4914)
ZwwWayne May 20, 2024
c8794c3
[Build/CI] Enabling AMD Entrypoints Test (#4834)
Alexei-V-Ivanov-AMD May 20, 2024
5b6a7b5
[Bugfix] Fix dummy weight for fp8 (#4916)
mzusman May 20, 2024
a5e66c7
[Core] Sharded State Loader download from HF (#4889)
aurickq May 20, 2024
8a78ed8
[Doc]Add documentation to benchmarking script when running TGI (#4920)
KuntaiDu May 20, 2024
6b46dcf
[Core] Fix scheduler considering "no LoRA" as "LoRA" (#4897)
Yard1 May 21, 2024
907d48a
[Model] add rope_scaling support for qwen2 (#4930)
hzhwcmhf May 21, 2024
11d6f7e
[Model] Add Phi-2 LoRA support (#4886)
Isotr0py May 21, 2024
5d98989
[Docs] Add acknowledgment for sponsors (#4925)
simon-mo May 21, 2024
58a235b
[CI/Build] Codespell ignore `build/` directory (#4945)
mgoin May 21, 2024
253d8fb
[Bugfix] Fix flag name for `max_seq_len_to_capture` (#4935)
kerthcet May 21, 2024
f744125
[Bugfix][Kernel] Add head size check for attention backend selection …
Isotr0py May 21, 2024
c1672a9
[Frontend] Dynamic RoPE scaling (#4638)
sasha0552 May 22, 2024
4b6c961
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#…
mgoin May 22, 2024
4b74974
[misc] remove comments that were supposed to be removed (#4977)
rkooo567 May 22, 2024
39c15ee
[Kernel] Fixup for CUTLASS kernels in CUDA graphs (#4954)
tlrmchlsmth May 22, 2024
2835fc6
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
comaniac May 22, 2024
3db99a6
[Model] LoRA gptbigcode implementation (#3949)
raywanb May 22, 2024
39a0a40
[Core] Eliminate parallel worker per-step task scheduling overhead (#…
njhill May 22, 2024
847ca88
[Minor] Fix small typo in llama.py: QKVParallelLinear -> Quantization…
pcmoritz May 22, 2024
c60384c
[Misc] Take user preference in attention selector (#4960)
comaniac May 22, 2024
dae5aaf
Marlin 24 prefill performance improvement (about 25% better on averag…
alexm-neuralmagic May 23, 2024
05a4f64
[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is n…
LetianLee May 23, 2024
bf4c411
[Core][1/N] Support send/recv in PyNCCL Groups (#4988)
andoorve May 23, 2024
c623663
[Kernel] Initial Activation Quantization Support (#4525)
dsikka May 23, 2024
a9ca32d
[Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985)
kezouke May 23, 2024
0eb33b1
[Doc] add ccache guide in doc (#5012)
youkaichao May 23, 2024
acf362c
[Kernel] Initial Activation Quantization Support (#4525)
robertgshaw2-neuralmagic May 24, 2024
1226d5d
[Core][Bugfix]: fix prefix caching for blockv2 (#4764)
leiwen83 May 24, 2024
29a2098
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3…
linxihui May 25, 2024
3fe7e52
[Misc] add logging level env var (#5045)
youkaichao May 25, 2024
8768b3f
[Dynamic Spec Decoding] Minor fix for disabling speculative decoding …
LiuXiaoxuanPKU May 25, 2024
e7e376f
[Misc] Make Serving Benchmark More User-friendly (#5044)
ywang96 May 25, 2024
67ce9ea
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
zhuohan123 May 27, 2024
2c59c91
[Core] Allow AQLM on Pascal (#5058)
sasha0552 May 27, 2024
9fb7b82
[Model] Add support for falcon-11B (#5069)
Isotr0py May 27, 2024
954c332
[Core] Sliding window for block manager v2 (#4545)
mmoskal May 28, 2024
9929fb2
[BugFix] Fix Embedding Models with TP>1 (#5075)
robertgshaw2-neuralmagic May 28, 2024
b22d985
[Kernel][ROCm][AMD] Add fused_moe Triton configs for MI300X (#4951)
divakar-amd May 28, 2024
54c17a9
[Docs] Add Dropbox as sponsors (#5089)
simon-mo May 28, 2024
8c9aab4
[Core] Consolidate prompt arguments to LLM engines (#4328)
DarkLight1337 May 28, 2024
705789d
[Bugfix] Remove the last EOS token unless explicitly specified (#5077)
jsato8094 May 29, 2024
95c2a3d
[Misc] add gpu_memory_utilization arg (#5079)
pandyamarut May 29, 2024
9175890
[Core][Optimization] remove vllm-nccl (#5091)
youkaichao May 29, 2024
420c4ff
[Bugfix] Fix arguments passed to `Sequence` in stop checker test (#5092)
DarkLight1337 May 29, 2024
5bde5ba
[Core][Distributed] improve p2p access check (#4992)
youkaichao May 29, 2024
b86aa89
[Core] Cross-attention KV caching and memory-management (towards even…
afeldman-nm May 29, 2024
f63e8dd
[Doc]Replace deprecated flag in readme (#4526)
ronensc May 29, 2024
62a4fcb
[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterato…
DarkLight1337 May 29, 2024
f900bcc
[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff` …
DarkLight1337 May 29, 2024
6824b2f
[Core] Avoid the need to pass `None` values to `Sequence.inputs` (#5099)
DarkLight1337 May 29, 2024
623275f
[Bugfix] logprobs is not compatible with the OpenAI spec #4795 (#5031)
Etelis May 29, 2024
15dcd3e
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
youkaichao May 29, 2024
5763c73
[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#…
alexm-neuralmagic May 30, 2024
3a8332c
[CI/Build] Docker cleanup functionality for amd servers (#5112)
okakarpa May 30, 2024
11a5a26
[BUGFIX] [FRONTEND] Correct chat logprobs (#5029)
br3no May 30, 2024
2827c68
[Bugfix] Automatically Detect SparseML models (#5119)
robertgshaw2-neuralmagic May 30, 2024
4ae80dd
[CI/Build] increase wheel size limit to 200 MB (#5130)
youkaichao May 30, 2024
886ead6
[Misc] remove duplicate definition of `seq_lens_tensor` in model_runn…
ita9naiwa May 30, 2024
758b903
[Doc] Use intersphinx and update entrypoints docs (#5125)
DarkLight1337 May 30, 2024
a190463
add doc about serving option on dstack (#3074)
deep-diver May 30, 2024
51cf757
Bump version to v0.4.3 (#5046)
simon-mo May 30, 2024
c72d890
[Build] Disable sm_90a in cu11 (#5141)
simon-mo May 30, 2024
cf0711b
[Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120)
robertgshaw2-neuralmagic May 31, 2024
dcaf819
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::orde…
alexm-neuralmagic May 31, 2024
7da3c3f
Fix cutlass sm_90a vesrion in CMakeList
simon-mo May 31, 2024
2c66f17
[Model] Support MAP-NEO model (#5081)
xingweiqu May 31, 2024
5388c64
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using th…
simon-mo May 31, 2024
5e9f300
[Misc]: optimize eager mode host time (#4196)
FuncSherl May 31, 2024
f329e2e
[Model] Enable FP8 QKV in MoE and refine kernel tuning script (#5039)
comaniac May 31, 2024
951e3d2
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171)
njhill Jun 1, 2024
d349dbd
[Build] Guard against older CUDA versions when building CUTLASS 3.x k…
tlrmchlsmth Jun 1, 2024
031fd4e
format
robertgshaw2-neuralmagic Jun 8, 2024
9ed5f76
skip blockspase attention
robertgshaw2-neuralmagic Jun 9, 2024
ec71544
fix falcon
robertgshaw2-neuralmagic Jun 9, 2024
7381340
skip sliding window chunked prefill
robertgshaw2-neuralmagic Jun 9, 2024
c23ca05
skip prefix prefill
robertgshaw2-neuralmagic Jun 9, 2024
85512eb
skip tensorizer
robertgshaw2-neuralmagic Jun 9, 2024
0cea2c2
[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…
mgoin Jun 8, 2024
31147df
format
robertgshaw2-neuralmagic Jun 9, 2024
b2afd77
added lm eval test group
robertgshaw2-neuralmagic Jun 9, 2024
85d54e8
added env variable entrypoint
robertgshaw2-neuralmagic Jun 9, 2024
49fdf7d
format
robertgshaw2-neuralmagic Jun 9, 2024
e6ac051
format
robertgshaw2-neuralmagic Jun 9, 2024
5f83af8
format
robertgshaw2-neuralmagic Jun 9, 2024
61e8d8a
skip kernels env variable
robertgshaw2-neuralmagic Jun 9, 2024
fa58955
skipping lora env variable
robertgshaw2-neuralmagic Jun 9, 2024
2256610
fix issue with internal method
robertgshaw2-neuralmagic Jun 9, 2024
01973f5
formatting
robertgshaw2-neuralmagic Jun 9, 2024
ac25d3a
spec decode env variable
robertgshaw2-neuralmagic Jun 9, 2024
4fbff35
stash model changes
robertgshaw2-neuralmagic Jun 9, 2024
977edff
fixed basic server correctness
robertgshaw2-neuralmagic Jun 9, 2024
0266f28
format
robertgshaw2-neuralmagic Jun 9, 2024
51dff17
tensorizer, cleanup comment
robertgshaw2-neuralmagic Jun 9, 2024
775f6d4
cleanup README
robertgshaw2-neuralmagic Jun 9, 2024
88e3a55
newline nits
robertgshaw2-neuralmagic Jun 9, 2024
a1a659d
disabled more kernel tests that use triton
robertgshaw2-neuralmagic Jun 9, 2024
c50784c
updated cutlass skipping. We need cuda 12.4 in automation
robertgshaw2-neuralmagic Jun 9, 2024
99fa9f8
trigger kernel tests in automation
robertgshaw2-neuralmagic Jun 9, 2024
cdc9f49
clean up magic_wand test so that we only load the model once
robertgshaw2-neuralmagic Jun 9, 2024
b08194a
format
robertgshaw2-neuralmagic Jun 9, 2024
ccda2e7
format
robertgshaw2-neuralmagic Jun 9, 2024
51a7685
core, correctness
robertgshaw2-neuralmagic Jun 9, 2024
c42b18f
distributed
robertgshaw2-neuralmagic Jun 9, 2024
765aff0
format
robertgshaw2-neuralmagic Jun 9, 2024
e18bd8a
format
robertgshaw2-neuralmagic Jun 9, 2024
495488b
added tokenization group
robertgshaw2-neuralmagic Jun 9, 2024
d64bda5
worker
robertgshaw2-neuralmagic Jun 9, 2024
c6c6994
added models core
robertgshaw2-neuralmagic Jun 9, 2024
c9a2d02
added remote push
robertgshaw2-neuralmagic Jun 9, 2024
9b452a7
added action
robertgshaw2-neuralmagic Jun 9, 2024
bbe2906
updated remote push workflow
robertgshaw2-neuralmagic Jun 9, 2024
b2bb2bc
make sure action was saved
robertgshaw2-neuralmagic Jun 9, 2024
e629449
added action to build to just the action works
robertgshaw2-neuralmagic Jun 9, 2024
668e172
updated to tab these in
robertgshaw2-neuralmagic Jun 9, 2024
95d6fd7
undo indent
robertgshaw2-neuralmagic Jun 9, 2024
a64fdaa
cleanup action
robertgshaw2-neuralmagic Jun 9, 2024
9e6a4e9
removed example
robertgshaw2-neuralmagic Jun 9, 2024
352493e
added env var configs for all groups
robertgshaw2-neuralmagic Jun 9, 2024
8897dd1
updated other workflows
robertgshaw2-neuralmagic Jun 9, 2024
e1a1a59
switched for whitelist to blacklist
robertgshaw2-neuralmagic Jun 9, 2024
2ec6643
cleanup spurious setup.py change
robertgshaw2-neuralmagic Jun 9, 2024
0bb099c
readded the missing images
robertgshaw2-neuralmagic Jun 9, 2024
198f364
multilora inference
robertgshaw2-neuralmagic Jun 9, 2024
ec0e89a
offline inference with prefix
robertgshaw2-neuralmagic Jun 9, 2024
e6f1cbd
backend request func
robertgshaw2-neuralmagic Jun 9, 2024
ca8d74a
benchmark serving
robertgshaw2-neuralmagic Jun 9, 2024
5335ad9
prod monitoring readme
robertgshaw2-neuralmagic Jun 9, 2024
611cfed
format
robertgshaw2-neuralmagic Jun 9, 2024
73132a5
fix benchmark issue - internal method changed
robertgshaw2-neuralmagic Jun 9, 2024
7f5c715
removed skip for remote push edits
robertgshaw2-neuralmagic Jun 9, 2024
437912e
update internal method in benchmark throughput too
robertgshaw2-neuralmagic Jun 10, 2024
828d9d1
Merge branch 'upstream-sync-2024-06-08' into remote-push-refactor
robertgshaw2-neuralmagic Jun 10, 2024
c754d5a
skip sharded state loader - hanging in automation
robertgshaw2-neuralmagic Jun 10, 2024
2bf55cd
skip entrypoints tests in remote-push - too long
robertgshaw2-neuralmagic Jun 10, 2024
2657891
cleanup TEST_ALL_MODELS comment
robertgshaw2-neuralmagic Jun 10, 2024
389bdcd
skip samplers during remote push
robertgshaw2-neuralmagic Jun 10, 2024
5dd3f5d
cleanup newline nit
robertgshaw2-neuralmagic Jun 11, 2024
a475844
switch to enable / disable
robertgshaw2-neuralmagic Jun 11, 2024
397cfe2
readded
robertgshaw2-neuralmagic Jun 11, 2024
8c6d1f3
convert workflows to use new files
robertgshaw2-neuralmagic Jun 11, 2024
e093e61
updated each comment
robertgshaw2-neuralmagic Jun 11, 2024
e95ad95
updated missed core files
robertgshaw2-neuralmagic Jun 11, 2024
fe0be9e
updated test core
robertgshaw2-neuralmagic Jun 11, 2024
4fabe98
format
robertgshaw2-neuralmagic Jun 11, 2024
ae39285
Merge branch 'main' into remote-push-refactor
robertgshaw2-neuralmagic Jun 11, 2024
14dedf1
fix bad merge llm_generate
robertgshaw2-neuralmagic Jun 11, 2024
4b078bd
fix bad merge oot_registration
robertgshaw2-neuralmagic Jun 11, 2024
05c5702
duplicate mark
robertgshaw2-neuralmagic Jun 11, 2024
e8166df
Merge branch 'main' into remote-push-refactor
robertgshaw2-neuralmagic Jun 11, 2024
08c8e55
Merge branch 'remote-push-refactor' of https://github.com/neuralmagic…
robertgshaw2-neuralmagic Jun 11, 2024
62f6283
yapf on models core
robertgshaw2-neuralmagic Jun 11, 2024
9b2d02f
Replace '0' with 'ENABLE'
dbarbuzzi Jun 13, 2024
4b691b9
Merge branch 'main' into remote-push-refactor
dbarbuzzi Jun 13, 2024
ef38251
Small fixes from conflict resolution
dbarbuzzi Jun 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/actions/nm-set-env-test-skip/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: set test skip env vars
description: 'sets env variables for test skipping. See tests/utils_skip.py'
inputs:
test_skip_env_vars:
description: 'file with list of env vars controlling which tests to run.'
required: true

runs:
using: composite
steps:
- run: |
cat "${ENV_VAR_FILE}" >> $GITHUB_ENV
env:
ENV_VAR_FILE: ${{ inputs.test_skip_env_vars }}
shell: bash
8 changes: 4 additions & 4 deletions .github/workflows/nm-build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ on:
description: "git commit hash or branch name"
type: string
required: true
test_skip_list:
description: 'file containing tests to skip'
test_skip_env_vars:
description: 'file with list of env vars controlling which tests to run'
type: string
required: true
# benchmark related parameters
Expand Down Expand Up @@ -91,7 +91,7 @@ jobs:
gitref: ${{ github.ref }}
python: ${{ inputs.python }}
whl: ${{ needs.BUILD.outputs.whl }}
test_skip_list: ${{ inputs.test_skip_list }}
test_skip_env_vars: ${{ inputs.test_skip_env_vars }}
secrets: inherit

# TODO: re-enable
Expand All @@ -105,7 +105,7 @@ jobs:
# gitref: ${{ github.ref }}
# python: ${{ inputs.python }}
# whl: ${{ needs.BUILD.outputs.whl }}
# test_skip_list: ${{ inputs.test_skip_list }}
# test_skip_env_vars: ${{ inputs.test_skip_env_vars }}
# secrets: inherit

UPLOAD:
Expand Down
9 changes: 5 additions & 4 deletions .github/workflows/nm-nightly.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: nm Nightly
name: nm nightly
run-name: ${{ github.actor }} triggered nightly on ${{ github.ref }}
on:
schedule:
Expand Down Expand Up @@ -45,7 +45,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-nightly.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand All @@ -63,7 +63,7 @@ jobs:
test_label_solo: aws-avx2-32G-a10g-24G
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-nightly.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand All @@ -81,7 +81,8 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-nightly.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt


benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/nm-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 720
test_skip_list: neuralmagic/tests/skip-for-release.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_nightly_configs_list.txt
Expand All @@ -41,7 +41,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 720
test_skip_list: neuralmagic/tests/skip-for-release.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_nightly_configs_list.txt
Expand All @@ -59,7 +59,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 720
test_skip_list: neuralmagic/tests/skip-for-release.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_nightly_configs_list.txt
Expand All @@ -77,7 +77,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 720
test_skip_list: neuralmagic/tests/skip-for-release.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_nightly_configs_list.txt
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/nm-remote-push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand All @@ -37,7 +37,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For remote-push, maybe we don't need to run all 4 python versions, and only 38/311 is good enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's get to that after this is merged


benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand All @@ -53,7 +53,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand All @@ -69,7 +69,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt

benchmark_label: gcp-k8s-l4-solo
benchmark_config_list_file: ./.github/data/nm_benchmark_remote_push_configs_list.txt
Expand Down
15 changes: 10 additions & 5 deletions .github/workflows/nm-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ on:
description: "whl to test (variable appears late binding so unusable outside 'download artifact')"
type: string
required: true
test_skip_list:
description: 'file containing tests to skip'
test_skip_env_vars:
description: 'file containing tests env vars for test skipping'
type: string
required: true

Expand All @@ -51,8 +51,8 @@ on:
description: "whl to test (variable appears late binding so unusable outside 'download artifact')"
type: string
required: true
test_skip_list:
description: 'file containing tests to skip'
test_skip_env_vars:
description: 'file containing tests env vars for test skipping'
type: string
required: true

Expand Down Expand Up @@ -131,12 +131,17 @@ jobs:
- name: run buildkite script
run: |
cd tests && sudo bash ../.buildkite/download-images.sh

- name: setenv test skip
id: setenv_test_skip
uses: ./.github/actions/nm-set-env-test-skip
with:
test_skip_env_vars: ${{ inputs.test_skip_env_vars }}

- name: run tests
id: test
uses: ./.github/actions/nm-test-whl/
with:
test_skip_list: ${{ inputs.test_skip_list }}
test_directory: tests
test_results: test-results

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nm-weekly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
test_label_solo: aws-avx2-32G-a10g-24G
test_label_multi: aws-avx2-192G-4-a10g-96G
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-weekly.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/full.txt

benchmark_label: aws-avx2-32G-a10g-24G
benchmark_config_list_file: ./.github/data/nm_benchmark_weekly_configs_list.txt
Expand Down
19 changes: 19 additions & 0 deletions neuralmagic/tests/test_skip_env_vars/full.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
TEST_ACCURACY=DISABLE
TEST_ASYNC_ENGINE=ENABLE
TEST_BASIC_CORRECTNESS=ENABLE
TEST_CORE=ENABLE
TEST_DISTRIBUTED=DISABLE
TEST_ENGINE=ENABLE
TEST_ENTRYPOINTS=ENABLE
TEST_KERNELS=ENABLE
TEST_LORA=ENABLE
TEST_METRICS=ENABLE
TEST_MODELS=ENABLE
TEST_MODELS_CORE=ENABLE
TEST_PREFIX_CACHING=ENABLE
TEST_QUANTIZATION=ENABLE
TEST_SAMPLERS=ENABLE
TEST_SPEC_DECODE=DISABLE
TEST_TENSORIZER_LOADER=ENABLE
TEST_TOKENIZATION=ENABLE
TEST_WORKER=ENABLE
19 changes: 19 additions & 0 deletions neuralmagic/tests/test_skip_env_vars/smoke.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
TEST_ACCURACY=DISABLE
TEST_ASYNC_ENGINE=ENABLE
TEST_BASIC_CORRECTNESS=DISABLE
TEST_CORE=ENABLE
TEST_DISTRIBUTED=DISABLE
TEST_ENGINE=ENABLE
TEST_ENTRYPOINTS=DISABLE
TEST_KERNELS=DISABLE
TEST_LORA=DISABLE
TEST_METRICS=ENABLE
TEST_MODELS=DISABLE
TEST_MODELS_CORE=ENABLE
TEST_PREFIX_CACHING=ENABLE
TEST_QUANTIZATION=ENABLE
TEST_SAMPLERS=DISABLE
TEST_SPEC_DECODE=DISABLE
TEST_TENSORIZER_LOADER=DISABLE
TEST_TOKENIZATION=ENABLE
TEST_WORKER=ENABLE
2 changes: 2 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ peft
requests==2.31
ray
sentence-transformers # required for embedding
optimum # required for hf gptq baselines
auto-gptq # required for hf gptq baselines

# Benchmarking
aiohttp
Expand Down
5 changes: 5 additions & 0 deletions tests/accuracy/test_lm_eval_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
import yaml

from tests.nm_utils.server import ServerContext
from tests.nm_utils.utils_skip import should_skip_test_group

if should_skip_test_group(group_name="TEST_ACCURACY"):
pytest.skip("TEST_ACCURACY=DISABLE, skipping accuracy test group",
allow_module_level=True)

if TYPE_CHECKING:
import lm_eval as lm_eval_t
Expand Down
6 changes: 6 additions & 0 deletions tests/async_engine/test_api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
import pytest
import requests

from tests.nm_utils.utils_skip import should_skip_test_group

if should_skip_test_group(group_name="TEST_ASYNC_ENGINE"):
pytest.skip("TEST_ASYNC_ENGINE=DISABLE, skipping async engine test group",
allow_module_level=True)


def _query_server(prompt: str, max_tokens: int = 5) -> dict:
response = requests.post("http://localhost:8000/generate",
Expand Down
5 changes: 5 additions & 0 deletions tests/async_engine/test_async_llm_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@

import pytest

from tests.nm_utils.utils_skip import should_skip_test_group
from vllm.engine.async_llm_engine import AsyncLLMEngine

if should_skip_test_group(group_name="TEST_ASYNC_ENGINE"):
pytest.skip("TEST_ASYNC_ENGINE=DISABLE, skipping async engine test group",
allow_module_level=True)


@dataclass
class RequestOutput:
Expand Down
5 changes: 5 additions & 0 deletions tests/async_engine/test_chat_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,15 @@

import pytest

from tests.nm_utils.utils_skip import should_skip_test_group
from vllm.entrypoints.openai.protocol import ChatCompletionRequest
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
from vllm.transformers_utils.tokenizer import get_tokenizer

if should_skip_test_group(group_name="TEST_ASYNC_ENGINE"):
pytest.skip("TEST_ASYNC_ENGINE=DISABLE, skipping async engine test group",
allow_module_level=True)

chatml_jinja_path = pathlib.Path(os.path.dirname(os.path.abspath(
__file__))).parent.parent / "examples/template_chatml.jinja"
assert chatml_jinja_path.exists()
Expand Down
5 changes: 5 additions & 0 deletions tests/async_engine/test_openapi_server_ray.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,13 @@
# and debugging.
import ray

from tests.nm_utils.utils_skip import should_skip_test_group
from tests.utils import ServerRunner

if should_skip_test_group(group_name="TEST_ASYNC_ENGINE"):
pytest.skip("TEST_ASYNC_ENGINE=DISABLE, skipping async engine test group",
allow_module_level=True)

# any model with a chat template should work here
MODEL_NAME = "facebook/opt-125m"

Expand Down
5 changes: 5 additions & 0 deletions tests/async_engine/test_request_tracker.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
import pytest

from tests.nm_utils.utils_skip import should_skip_test_group
from vllm.engine.async_llm_engine import RequestTracker
from vllm.outputs import RequestOutput

if should_skip_test_group(group_name="TEST_ASYNC_ENGINE"):
pytest.skip("TEST_ASYNC_ENGINE=DISABLE, skipping async engine test group",
allow_module_level=True)


@pytest.mark.asyncio
async def test_request_tracker():
Expand Down
6 changes: 6 additions & 0 deletions tests/basic_correctness/test_basic_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,14 @@

import pytest

from tests.nm_utils.utils_skip import should_skip_test_group
from vllm import LLM

if should_skip_test_group(group_name="TEST_BASIC_CORRECTNESS"):
pytest.skip(
"TEST_BASIC_CORRECTNESS=DISABLE, skipping basic correctness test group",
allow_module_level=True)

MODELS = [
"facebook/opt-125m",
"meta-llama/Llama-2-7b-hf",
Expand Down
Loading
Loading