We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker run --gpus 1 -v ./data:/data -p 8005:80 ghcr.io/predibase/lorax:a8ca5cb \ --prefix-caching true \ --port 80 \ --model-id Open-Orca/Mistral-7B-OpenOrca \ --cuda-memory-fraction 0.8 \ --sharded false \ --max-waiting-tokens 20 \ --max-input-length 4096 \ --max-total-tokens 8192 \ --hostname 0.0.0.0 \ --max-concurrent-requests 512 \ --max-best-of 1 \ --max-batch-prefill-tokens 4096 \ --max-active-adapters 10 \ --adapter-source local \ --adapter-cycle-time-s 2 \ --json-output \ --disable-custom-kernels \ --dtype float16
The server starts successfully and the prefix-caching works well
The text was updated successfully, but these errors were encountered:
@tgaddair Hi bro, any update on this?
Sorry, something went wrong.
No branches or pull requests
System Info
Information
Tasks
Reproduction
Expected behavior
The server starts successfully and the prefix-caching works well
The text was updated successfully, but these errors were encountered: