We are following semantic versioning with strict backward-compatibility policy.
You can find out backwards-compatibility policy here.
Changes for the upcoming release can be found in the 'changelog.d' directory in our repository.
-
--quantize
now takesint8, int4
instead of8bit, 4bit
to be consistent with bitsandbytes concept.openllm CLI
now cached all available model command, allow faster startup time.Fixes
openllm start model-id --debug
to filtered out debug message log frombentoml.Server
.--model-id
fromopenllm start
now support choice for easier selection.Updated
ModelConfig
implementation with getitem and auto generation value.Cleanup CLI and improve loading time,
openllm start
should be 'blazingly fast'. #28
-
Added support for quantization during serving time.
openllm start
now support--quantize int8
and--quantize int4
GPTQ
quantization support is on the roadmap and currently being worked on.openllm start
now also support--bettertransformer
to useBetterTransformer
for serving.Refactored
openllm.LLMConfig
to be able to use with__getitem__
:openllm.DollyV2Config()['requirements']
.The access order being:
__openllm_*__ > self.<key> > __openllm_generation_class__ > __openllm_extras__
.Added
towncrier
workflow to easily generate changelog entriesAdded
use_pipeline
,bettertransformer
flag into ModelSettingsLLMConfig
now supported__dataclass_transform__
protocol to help with type-checkingopenllm download-models
now becomesopenllm download
#27