Unexpected response with long-context model (Phi-3) #651

prd-tuong-nguyen · 2024-10-17T07:34:10Z

System Info

ghcr.io/predibase/lorax:f1ef0ee

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

  --port 80 \
  --model-id microsoft/Phi-3-mini-128k-instruct \
  --cuda-memory-fraction 0.8 \
  --sharded false \
  --max-waiting-tokens 20 \
  --max-input-length 4096 \
  --max-total-tokens 8192 \
  --hostname 0.0.0.0 \
  --max-concurrent-requests 512 \
  --max-best-of 1  \
  --max-batch-prefill-tokens $BATCH_TOKEN \
  --max-active-adapters 10 \
  --adapter-source local \
  --adapter-cycle-time-s 2 \
  --json-output \
  --disable-custom-kernels \
  --dtype float16```

### Expected behavior

When running LoraX with the model microsoft/Phi-3-mini-128k-instruct, I encountered unexpected behavior with the following configurations:

Configuration A:
- max-input-length = 4096
- max-total-tokens = 8192
- Prompt Length: Approximately 1000 tokens
In this configuration, the generated response differs significantly from what is produced by VLLM.

Configuration B:
- max-input-length = 4090
- max-total-tokens = 4096
This configuration works well and produces expected results.

Additionally, I tested the model microsoft/Phi-3-mini-4k-instruct, and it also functioned correctly.

It seems there may be an issue with handling long contexts when using microsoft/Phi-3-mini-128k-instruct.

Could you please investigate this issue? I found a related discussion here: [Hugging Face Discussion](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions/85). Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected response with long-context model (Phi-3) #651

Unexpected response with long-context model (Phi-3) #651

prd-tuong-nguyen commented Oct 17, 2024

Unexpected response with long-context model (Phi-3) #651

Unexpected response with long-context model (Phi-3) #651

Comments

prd-tuong-nguyen commented Oct 17, 2024

System Info

Information

Tasks

Reproduction