Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to run server with prefix-caching option #599

Open
2 of 4 tasks
prd-tuong-nguyen opened this issue Sep 11, 2024 · 1 comment
Open
2 of 4 tasks

Fail to run server with prefix-caching option #599

prd-tuong-nguyen opened this issue Sep 11, 2024 · 1 comment

Comments

@prd-tuong-nguyen
Copy link

System Info

  • ghcr.io/predibase/lorax:a8ca5cb
  • Ubuntu 20.04
  • GPU A10G

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

docker run --gpus 1 -v ./data:/data -p 8005:80 ghcr.io/predibase/lorax:a8ca5cb \
  --prefix-caching true \
  --port 80 \
  --model-id Open-Orca/Mistral-7B-OpenOrca \
  --cuda-memory-fraction 0.8 \
  --sharded false \
  --max-waiting-tokens 20 \
  --max-input-length 4096 \
  --max-total-tokens 8192 \
  --hostname 0.0.0.0 \
  --max-concurrent-requests 512 \
  --max-best-of 1  \
  --max-batch-prefill-tokens 4096 \
  --max-active-adapters 10 \
  --adapter-source local \
  --adapter-cycle-time-s 2 \
  --json-output \
  --disable-custom-kernels \
  --dtype float16

Expected behavior

The server starts successfully and the prefix-caching works well

@prd-tuong-nguyen
Copy link
Author

@tgaddair Hi bro, any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant