Skip to content

v0.4.41

Compare
Choose a tag to compare
@github-actions github-actions released this 18 Dec 18:18
· 518 commits to main since this release

GPTQ Supports

vLLM backend now support GPTQ with upstream

openlml start TheBloke/Mistral-7B-Instruct-v0.2-GPTQ --backend vllm --quantise gptq

Installation

pip install openllm==0.4.41

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.41

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.41 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • docs: add notes about dtypes usage. by @aarnphm in #786
  • chore(deps): bump taiki-e/install-action from 2.22.0 to 2.22.5 by @dependabot in #790
  • chore(deps): bump github/codeql-action from 2.22.9 to 3.22.11 by @dependabot in #794
  • chore(deps): bump sigstore/cosign-installer from 3.2.0 to 3.3.0 by @dependabot in #793
  • chore(deps): bump actions/download-artifact from 3.0.2 to 4.0.0 by @dependabot in #791
  • chore(deps): bump actions/upload-artifact from 3.1.3 to 4.0.0 by @dependabot in #792
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #796
  • fix(cli): avoid runtime __origin__ check for older Python by @aarnphm in #798
  • feat(vllm): support GPTQ with 0.2.6 by @aarnphm in #797
  • fix(ci): lock to v3 iteration of actions/artifacts workflow by @aarnphm in #799

Full Changelog: v0.4.40...v0.4.41