Skip to content
This repository has been archived by the owner on Oct 2, 2024. It is now read-only.

Releases: mlcommons/modelgauge

v0.6.3

13 Sep 00:19
44d1a0d
Compare
Choose a tag to compare
v0.6.3 Pre-release
Pre-release

What's Changed

Full Changelog: v0.6.2...v0.6.3

v0.6.2

05 Sep 20:50
2e6e135
Compare
Choose a tag to compare
v0.6.2 Pre-release
Pre-release

What's Changed

Full Changelog: v0.6.1...v0.6.2

v0.6.1

05 Sep 15:05
8aa7209
Compare
Choose a tag to compare
v0.6.1 Pre-release
Pre-release

What's Changed

  • Fix bug where bad raw annotations are cached forever
  • Remove safetest base class
  • Minor improvements for pipeline debugging
  • Adding 'system' role to openai_client _ROLE_MAP by @shachihk-intel
  • Better together API errors
  • Keep track of items that can't be processed
  • Updated dependencies and add notebook linter
  • Remove deprecated Together models, and update tests to match

New Contributors

Full Changelog: v0.6.0...v0.6.1

v0.6.0

06 Aug 21:12
7e3a13c
Compare
Choose a tag to compare
v0.6.0 Pre-release
Pre-release

What's Changed

  • Together and HuggingFace SUTs can now return log probs in their responses when requested.
  • New CLI option --plugin-dir loads local plugins at runtime.
  • Increase reliability of downloading test data.
  • Prepare modelgauge infra files for safety evaluator testing (new "System" chat role, minor llama_guard_annotator refactor).
  • Documentation updates, including initial API reference.
  • Introduce Pipeline and related classes to serve as the base for a composable set of objects that handle common bulk processing tasks like running prompts, getting annotations, and any other slow I/O-bound workloads.
  • SafeTests use files from dev deployment of modellab.
  • New run-csv-items command quickly runs batches of prompts and/or responses in a CSV file through some SUTs and/or annotators.
  • Add new v1.0 SafeTest class and place-holder test safe-dfm-1.0. Version 0.5 tests (e.g. safe-cae) are not affected.
  • Move Together plugin files + SafeTest into core modelgauge library.

New Contributors

Full Changelog: v0.5.1...v0.6.0

v0.5.1

26 Apr 21:10
79283fd
Compare
Choose a tag to compare
v0.5.1 Pre-release
Pre-release

What's Changed

  • Updated docs
  • SafeTest compatible with python 3.11+
  • Add new Llama Guard 2 to LlamaGuardAnnotator
    • Can configure LlamaGuardAnnotator with optional llama_guard_version parameter. Defaults to Llama Guard 2
    • Minor changes to prompt/category formatting for Llama Guard 1. This may affect results.
  • SafeTest can also be configured to use Llama Guard 1 or 2 as it's annotator. Defaults to version 2.

Full Changelog: v0.5.0...v0.5.1

v0.5.0

15 Apr 22:35
2e81a6c
Compare
Choose a tag to compare
v0.5.0 Pre-release
Pre-release

What's Changed

  • Renamed to ModelGauge and started pushing to PyPI!
  • A whole bunch of cleanups and preparation for the more public release.
  • Caching now supports dicts.
  • Unit tests to ensure you can install from PyPI and run in a notebook.
  • Expand range of supported python versions to 3.10 and up.
  • Remove benign hazard from SafeTest.
  • Start setting up ReadTheDocs.

Full Changelog: v0.3.3...v0.5.0

v0.3.3

09 Apr 23:00
4088c92
Compare
Choose a tag to compare
v0.3.3 Pre-release
Pre-release

What's Changed

  • Change SafeTest to data_april04 release.
    • More prompts
    • Removed safe-ben

Full Changelog: v0.3.2...v0.3.3

v0.3.2

09 Apr 21:50
Compare
Choose a tag to compare
v0.3.2 Pre-release
Pre-release

What's Changed

  • max_test_items returns a relatively stable set of prompts
  • Loading bar for plugins
  • Have list command report prettier values for secrets
  • Time out requests stuck on TogetherAI
  • Updated docs
  • Move simple_test_runner out of plugins and into core library

Full Changelog: v0.3.1...v0.3.2

v0.3.1

03 Apr 17:13
daf4e5c
Compare
Choose a tag to compare
v0.3.1 Pre-release
Pre-release

What's Changed

  • Fix bad version specification for together dependency, which was causing 0.3.0 to not actually install.
  • Add Deepseek model that is now available on Together.
  • Stabilize the order of TestItems in SafeTest to better utilize caching.

Full Changelog: v0.3.0...v0.3.1

v0.3.0

02 Apr 22:03
089b5d4
Compare
Choose a tag to compare
v0.3.0 Pre-release
Pre-release

What's Changed

  • Reorganized the run_data folder and made several improvements to caching. This breaks backward comparability. Old files should just be ignored, but if you run into issues, probably best to just delete your run_data folder.
  • Updated SafeTest to 02apr2024.
  • We now have all SUTs in the requested set, minus Deepseek.
  • Simplified the command line to be newhelm once installed or poetry run newhelm when using the local repo.
  • Annotations are now recorded per completion instead of per TestItem.
  • HuggingFace sets pad token to default, which should remove warning messages.
  • Added some enforcement of SUTCapabilities to help them be accurate.
  • Remove all "Base" prefixes except BaseTest.

Full Changelog: v0.2.6...v0.3.0