This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 10
Remote push refactor #297
Merged
Merged
Remote push refactor #297
Changes from all commits
Commits
Show all changes
159 commits
Select commit
Hold shift + click to select a range
e69d23b
[Kernel] Add marlin_24 unit tests (#4901)
alexm-neuralmagic 81ec16b
[Kernel] Add flash-attn back (#4907)
WoosukKwon 5500975
[Model] LLaVA model refactor (#4910)
DarkLight1337 b913d04
Remove marlin warning (#4918)
alexm-neuralmagic 683a30b
[Misc]: allow user to specify port in distributed setting (#4914)
ZwwWayne c8794c3
[Build/CI] Enabling AMD Entrypoints Test (#4834)
Alexei-V-Ivanov-AMD 5b6a7b5
[Bugfix] Fix dummy weight for fp8 (#4916)
mzusman a5e66c7
[Core] Sharded State Loader download from HF (#4889)
aurickq 8a78ed8
[Doc]Add documentation to benchmarking script when running TGI (#4920)
KuntaiDu 6b46dcf
[Core] Fix scheduler considering "no LoRA" as "LoRA" (#4897)
Yard1 907d48a
[Model] add rope_scaling support for qwen2 (#4930)
hzhwcmhf 11d6f7e
[Model] Add Phi-2 LoRA support (#4886)
Isotr0py 5d98989
[Docs] Add acknowledgment for sponsors (#4925)
simon-mo 58a235b
[CI/Build] Codespell ignore `build/` directory (#4945)
mgoin 253d8fb
[Bugfix] Fix flag name for `max_seq_len_to_capture` (#4935)
kerthcet f744125
[Bugfix][Kernel] Add head size check for attention backend selection …
Isotr0py c1672a9
[Frontend] Dynamic RoPE scaling (#4638)
sasha0552 4b6c961
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#…
mgoin 4b74974
[misc] remove comments that were supposed to be removed (#4977)
rkooo567 39c15ee
[Kernel] Fixup for CUTLASS kernels in CUDA graphs (#4954)
tlrmchlsmth 2835fc6
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
comaniac 3db99a6
[Model] LoRA gptbigcode implementation (#3949)
raywanb 39a0a40
[Core] Eliminate parallel worker per-step task scheduling overhead (#…
njhill 847ca88
[Minor] Fix small typo in llama.py: QKVParallelLinear -> Quantization…
pcmoritz c60384c
[Misc] Take user preference in attention selector (#4960)
comaniac dae5aaf
Marlin 24 prefill performance improvement (about 25% better on averag…
alexm-neuralmagic 05a4f64
[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is n…
LetianLee bf4c411
[Core][1/N] Support send/recv in PyNCCL Groups (#4988)
andoorve c623663
[Kernel] Initial Activation Quantization Support (#4525)
dsikka a9ca32d
[Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985)
kezouke 0eb33b1
[Doc] add ccache guide in doc (#5012)
youkaichao acf362c
[Kernel] Initial Activation Quantization Support (#4525)
robertgshaw2-neuralmagic 1226d5d
[Core][Bugfix]: fix prefix caching for blockv2 (#4764)
leiwen83 29a2098
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3…
linxihui 3fe7e52
[Misc] add logging level env var (#5045)
youkaichao 8768b3f
[Dynamic Spec Decoding] Minor fix for disabling speculative decoding …
LiuXiaoxuanPKU e7e376f
[Misc] Make Serving Benchmark More User-friendly (#5044)
ywang96 67ce9ea
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
zhuohan123 2c59c91
[Core] Allow AQLM on Pascal (#5058)
sasha0552 9fb7b82
[Model] Add support for falcon-11B (#5069)
Isotr0py 954c332
[Core] Sliding window for block manager v2 (#4545)
mmoskal 9929fb2
[BugFix] Fix Embedding Models with TP>1 (#5075)
robertgshaw2-neuralmagic b22d985
[Kernel][ROCm][AMD] Add fused_moe Triton configs for MI300X (#4951)
divakar-amd 54c17a9
[Docs] Add Dropbox as sponsors (#5089)
simon-mo 8c9aab4
[Core] Consolidate prompt arguments to LLM engines (#4328)
DarkLight1337 705789d
[Bugfix] Remove the last EOS token unless explicitly specified (#5077)
jsato8094 95c2a3d
[Misc] add gpu_memory_utilization arg (#5079)
pandyamarut 9175890
[Core][Optimization] remove vllm-nccl (#5091)
youkaichao 420c4ff
[Bugfix] Fix arguments passed to `Sequence` in stop checker test (#5092)
DarkLight1337 5bde5ba
[Core][Distributed] improve p2p access check (#4992)
youkaichao b86aa89
[Core] Cross-attention KV caching and memory-management (towards even…
afeldman-nm f63e8dd
[Doc]Replace deprecated flag in readme (#4526)
ronensc 62a4fcb
[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterato…
DarkLight1337 f900bcc
[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff` …
DarkLight1337 6824b2f
[Core] Avoid the need to pass `None` values to `Sequence.inputs` (#5099)
DarkLight1337 623275f
[Bugfix] logprobs is not compatible with the OpenAI spec #4795 (#5031)
Etelis 15dcd3e
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
youkaichao 5763c73
[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#…
alexm-neuralmagic 3a8332c
[CI/Build] Docker cleanup functionality for amd servers (#5112)
okakarpa 11a5a26
[BUGFIX] [FRONTEND] Correct chat logprobs (#5029)
br3no 2827c68
[Bugfix] Automatically Detect SparseML models (#5119)
robertgshaw2-neuralmagic 4ae80dd
[CI/Build] increase wheel size limit to 200 MB (#5130)
youkaichao 886ead6
[Misc] remove duplicate definition of `seq_lens_tensor` in model_runn…
ita9naiwa 758b903
[Doc] Use intersphinx and update entrypoints docs (#5125)
DarkLight1337 a190463
add doc about serving option on dstack (#3074)
deep-diver 51cf757
Bump version to v0.4.3 (#5046)
simon-mo c72d890
[Build] Disable sm_90a in cu11 (#5141)
simon-mo cf0711b
[Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120)
robertgshaw2-neuralmagic dcaf819
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::orde…
alexm-neuralmagic 7da3c3f
Fix cutlass sm_90a vesrion in CMakeList
simon-mo 2c66f17
[Model] Support MAP-NEO model (#5081)
xingweiqu 5388c64
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using th…
simon-mo 5e9f300
[Misc]: optimize eager mode host time (#4196)
FuncSherl f329e2e
[Model] Enable FP8 QKV in MoE and refine kernel tuning script (#5039)
comaniac 951e3d2
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171)
njhill d349dbd
[Build] Guard against older CUDA versions when building CUTLASS 3.x k…
tlrmchlsmth 031fd4e
format
robertgshaw2-neuralmagic 9ed5f76
skip blockspase attention
robertgshaw2-neuralmagic ec71544
fix falcon
robertgshaw2-neuralmagic 7381340
skip sliding window chunked prefill
robertgshaw2-neuralmagic c23ca05
skip prefix prefill
robertgshaw2-neuralmagic 85512eb
skip tensorizer
robertgshaw2-neuralmagic 0cea2c2
[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…
mgoin 31147df
format
robertgshaw2-neuralmagic b2afd77
added lm eval test group
robertgshaw2-neuralmagic 85d54e8
added env variable entrypoint
robertgshaw2-neuralmagic 49fdf7d
format
robertgshaw2-neuralmagic e6ac051
format
robertgshaw2-neuralmagic 5f83af8
format
robertgshaw2-neuralmagic 61e8d8a
skip kernels env variable
robertgshaw2-neuralmagic fa58955
skipping lora env variable
robertgshaw2-neuralmagic 2256610
fix issue with internal method
robertgshaw2-neuralmagic 01973f5
formatting
robertgshaw2-neuralmagic ac25d3a
spec decode env variable
robertgshaw2-neuralmagic 4fbff35
stash model changes
robertgshaw2-neuralmagic 977edff
fixed basic server correctness
robertgshaw2-neuralmagic 0266f28
format
robertgshaw2-neuralmagic 51dff17
tensorizer, cleanup comment
robertgshaw2-neuralmagic 775f6d4
cleanup README
robertgshaw2-neuralmagic 88e3a55
newline nits
robertgshaw2-neuralmagic a1a659d
disabled more kernel tests that use triton
robertgshaw2-neuralmagic c50784c
updated cutlass skipping. We need cuda 12.4 in automation
robertgshaw2-neuralmagic 99fa9f8
trigger kernel tests in automation
robertgshaw2-neuralmagic cdc9f49
clean up magic_wand test so that we only load the model once
robertgshaw2-neuralmagic b08194a
format
robertgshaw2-neuralmagic ccda2e7
format
robertgshaw2-neuralmagic 51a7685
core, correctness
robertgshaw2-neuralmagic c42b18f
distributed
robertgshaw2-neuralmagic 765aff0
format
robertgshaw2-neuralmagic e18bd8a
format
robertgshaw2-neuralmagic 495488b
added tokenization group
robertgshaw2-neuralmagic d64bda5
worker
robertgshaw2-neuralmagic c6c6994
added models core
robertgshaw2-neuralmagic c9a2d02
added remote push
robertgshaw2-neuralmagic 9b452a7
added action
robertgshaw2-neuralmagic bbe2906
updated remote push workflow
robertgshaw2-neuralmagic b2bb2bc
make sure action was saved
robertgshaw2-neuralmagic e629449
added action to build to just the action works
robertgshaw2-neuralmagic 668e172
updated to tab these in
robertgshaw2-neuralmagic 95d6fd7
undo indent
robertgshaw2-neuralmagic a64fdaa
cleanup action
robertgshaw2-neuralmagic 9e6a4e9
removed example
robertgshaw2-neuralmagic 352493e
added env var configs for all groups
robertgshaw2-neuralmagic 8897dd1
updated other workflows
robertgshaw2-neuralmagic e1a1a59
switched for whitelist to blacklist
robertgshaw2-neuralmagic 2ec6643
cleanup spurious setup.py change
robertgshaw2-neuralmagic 0bb099c
readded the missing images
robertgshaw2-neuralmagic 198f364
multilora inference
robertgshaw2-neuralmagic ec0e89a
offline inference with prefix
robertgshaw2-neuralmagic e6f1cbd
backend request func
robertgshaw2-neuralmagic ca8d74a
benchmark serving
robertgshaw2-neuralmagic 5335ad9
prod monitoring readme
robertgshaw2-neuralmagic 611cfed
format
robertgshaw2-neuralmagic 73132a5
fix benchmark issue - internal method changed
robertgshaw2-neuralmagic 7f5c715
removed skip for remote push edits
robertgshaw2-neuralmagic 437912e
update internal method in benchmark throughput too
robertgshaw2-neuralmagic 828d9d1
Merge branch 'upstream-sync-2024-06-08' into remote-push-refactor
robertgshaw2-neuralmagic c754d5a
skip sharded state loader - hanging in automation
robertgshaw2-neuralmagic 2bf55cd
skip entrypoints tests in remote-push - too long
robertgshaw2-neuralmagic 2657891
cleanup TEST_ALL_MODELS comment
robertgshaw2-neuralmagic 389bdcd
skip samplers during remote push
robertgshaw2-neuralmagic 5dd3f5d
cleanup newline nit
robertgshaw2-neuralmagic a475844
switch to enable / disable
robertgshaw2-neuralmagic 397cfe2
readded
robertgshaw2-neuralmagic 8c6d1f3
convert workflows to use new files
robertgshaw2-neuralmagic e093e61
updated each comment
robertgshaw2-neuralmagic e95ad95
updated missed core files
robertgshaw2-neuralmagic fe0be9e
updated test core
robertgshaw2-neuralmagic 4fabe98
format
robertgshaw2-neuralmagic ae39285
Merge branch 'main' into remote-push-refactor
robertgshaw2-neuralmagic 14dedf1
fix bad merge llm_generate
robertgshaw2-neuralmagic 4b078bd
fix bad merge oot_registration
robertgshaw2-neuralmagic 05c5702
duplicate mark
robertgshaw2-neuralmagic e8166df
Merge branch 'main' into remote-push-refactor
robertgshaw2-neuralmagic 08c8e55
Merge branch 'remote-push-refactor' of https://github.com/neuralmagic…
robertgshaw2-neuralmagic 62f6283
yapf on models core
robertgshaw2-neuralmagic 9b2d02f
Replace '0' with 'ENABLE'
dbarbuzzi 4b691b9
Merge branch 'main' into remote-push-refactor
dbarbuzzi ef38251
Small fixes from conflict resolution
dbarbuzzi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
name: set test skip env vars | ||
description: 'sets env variables for test skipping. See tests/utils_skip.py' | ||
inputs: | ||
test_skip_env_vars: | ||
description: 'file with list of env vars controlling which tests to run.' | ||
required: true | ||
|
||
runs: | ||
using: composite | ||
steps: | ||
- run: | | ||
cat "${ENV_VAR_FILE}" >> $GITHUB_ENV | ||
env: | ||
ENV_VAR_FILE: ${{ inputs.test_skip_env_vars }} | ||
shell: bash |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
TEST_ACCURACY=DISABLE | ||
TEST_ASYNC_ENGINE=ENABLE | ||
TEST_BASIC_CORRECTNESS=ENABLE | ||
TEST_CORE=ENABLE | ||
TEST_DISTRIBUTED=DISABLE | ||
TEST_ENGINE=ENABLE | ||
TEST_ENTRYPOINTS=ENABLE | ||
TEST_KERNELS=ENABLE | ||
TEST_LORA=ENABLE | ||
TEST_METRICS=ENABLE | ||
TEST_MODELS=ENABLE | ||
TEST_MODELS_CORE=ENABLE | ||
TEST_PREFIX_CACHING=ENABLE | ||
TEST_QUANTIZATION=ENABLE | ||
TEST_SAMPLERS=ENABLE | ||
TEST_SPEC_DECODE=DISABLE | ||
TEST_TENSORIZER_LOADER=ENABLE | ||
TEST_TOKENIZATION=ENABLE | ||
TEST_WORKER=ENABLE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
TEST_ACCURACY=DISABLE | ||
TEST_ASYNC_ENGINE=ENABLE | ||
TEST_BASIC_CORRECTNESS=DISABLE | ||
TEST_CORE=ENABLE | ||
TEST_DISTRIBUTED=DISABLE | ||
TEST_ENGINE=ENABLE | ||
TEST_ENTRYPOINTS=DISABLE | ||
TEST_KERNELS=DISABLE | ||
TEST_LORA=DISABLE | ||
TEST_METRICS=ENABLE | ||
TEST_MODELS=DISABLE | ||
TEST_MODELS_CORE=ENABLE | ||
TEST_PREFIX_CACHING=ENABLE | ||
TEST_QUANTIZATION=ENABLE | ||
TEST_SAMPLERS=DISABLE | ||
TEST_SPEC_DECODE=DISABLE | ||
TEST_TENSORIZER_LOADER=DISABLE | ||
TEST_TOKENIZATION=ENABLE | ||
TEST_WORKER=ENABLE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For remote-push, maybe we don't need to run all 4 python versions, and only 38/311 is good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's get to that after this is merged