Skip to content

Actions: NVIDIA/TensorRT-LLM

auto-assign

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
140 workflow runs
140 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Support for LLaMa3.3
auto-assign #43: Issue #2567 labeled by nv-guomingz
December 18, 2024 02:04 2s
December 18, 2024 02:04 2s
Which version of InternVL does TensorRT-llm 1.5 support ?
auto-assign #42: Issue #2578 labeled by nv-guomingz
December 18, 2024 01:57 2s
December 18, 2024 01:57 2s
OOM when building engine for meta-llama/Llama-3.1-405B-FP8 on 8 x A100 80G
auto-assign #41: Issue #2586 labeled by nv-guomingz
December 18, 2024 01:33 3s
December 18, 2024 01:33 3s
llava-onevision convert bug
auto-assign #40: Issue #2585 labeled by nv-guomingz
December 18, 2024 01:32 49s
December 18, 2024 01:32 49s
llava-onevision convert bug
auto-assign #39: Issue #2585 labeled by liyi-xia
December 17, 2024 10:20 2s
December 17, 2024 10:20 2s
trtllm-build ignores --model_cls_file and --model_cls_name
auto-assign #38: Issue #2430 labeled by nv-guomingz
December 17, 2024 10:04 42s
December 17, 2024 10:04 42s
[feature request] lm_head quantization
auto-assign #37: Issue #2550 labeled by nv-guomingz
December 17, 2024 09:55 41s
December 17, 2024 09:55 41s
Performance issue with long context
auto-assign #36: Issue #2548 labeled by nv-guomingz
December 17, 2024 09:46 50s
December 17, 2024 09:46 50s
LayerInfo doesn't support fp8 and int4_awq dtype?
auto-assign #35: Issue #2547 labeled by nv-guomingz
December 17, 2024 09:45 41s
December 17, 2024 09:45 41s
int8 slower than bf16 on A100
auto-assign #34: Issue #2553 labeled by nv-guomingz
December 17, 2024 02:13 40s
December 17, 2024 02:13 40s
int8 slower than bf16 on A100
auto-assign #33: Issue #2553 labeled by nv-guomingz
December 17, 2024 02:11 39s
December 17, 2024 02:11 39s
trtllm-bench missing support of moe_ep_size / moe_tp_size.
auto-assign #32: Issue #2577 labeled by juewAtAmazon
December 16, 2024 01:59 2s
December 16, 2024 01:59 2s
Testing Actions
auto-assign #31: Issue #2572 labeled by kevinch-nv
December 12, 2024 21:34 2s
December 12, 2024 21:34 2s
Testing Actions
auto-assign #30: Issue #2572 labeled by kevinch-nv
December 12, 2024 21:34 38s
December 12, 2024 21:34 38s
TRT-LLM fails on GH200 node
auto-assign #29: Issue #2571 labeled by ttim
December 12, 2024 21:26 2s
December 12, 2024 21:26 2s
What does "weights_scaling_factor_2" mean in safetensor results of awq_w4a8
auto-assign #28: Issue #2561 labeled by nv-guomingz
December 12, 2024 00:14 40s
December 12, 2024 00:14 40s
What does "weights_scaling_factor_2" mean in safetensor results of awq_w4a8
auto-assign #27: Issue #2561 labeled by nv-guomingz
December 12, 2024 00:08 42s
December 12, 2024 00:08 42s
What does "weights_scaling_factor_2" mean in safetensor results of awq_w4a8
auto-assign #26: Issue #2561 labeled by nv-guomingz
December 12, 2024 00:07 50s
December 12, 2024 00:07 50s
Build fails on w8a8 with kv_cache_dtype FP8
auto-assign #25: Issue #2559 labeled by darraghdog
December 10, 2024 17:05 3s
December 10, 2024 17:05 3s
Can TensorRT-LLM Handle High Levels of Concurrent Requests?
auto-assign #24: Issue #2514 labeled by hello-11
December 10, 2024 08:52 3s
December 10, 2024 08:52 3s
Can TensorRT-LLM Handle High Levels of Concurrent Requests?
auto-assign #23: Issue #2514 labeled by hello-11
December 10, 2024 08:52 43s
December 10, 2024 08:52 43s
Medusa performance degrades with batch size larger than 1
auto-assign #22: Issue #2482 labeled by hello-11
December 10, 2024 06:57 3s
December 10, 2024 06:57 3s
Can't build whisper engines with past two releases
auto-assign #21: Issue #2508 labeled by hello-11
December 10, 2024 06:49 3s
December 10, 2024 06:49 3s
qserve is slower then awq int4 for llama2-7b on H100
auto-assign #20: Issue #2509 labeled by hello-11
December 10, 2024 06:48 2s
December 10, 2024 06:48 2s
int8 slower than bf16 on A100
auto-assign #19: Issue #2553 labeled by nv-guomingz
December 10, 2024 06:32 45s
December 10, 2024 06:32 45s