Skip to content

v0.5.2

Compare
Choose a tag to compare
@yifanmai yifanmai released this 17 Jun 21:54
· 288 commits to main since this release
bceee3f

Scenarios

Models

  • Added google-cloud-aiplatform~=1.48 dependency requirement for Vertex AI client (#2628)
  • Fixed bug with Vertex AI client error handling (#2614)
  • Fixed bug with for Arctic tokenizer (#2615)
  • Added Qwen1.5 110B Chat (#2621)
  • Added TogetherCompletionClient (#2629)
  • Fixed bugs with Yi Chat and Llama 3 Chat on Together (#2636)
  • Added Optimum Intel (#2609, #2674)
  • Added GPT-4o model (#2649, #2656)
  • Added SEA-LION 7B and SEA-LION 7B Instruct (#2647)
  • Added more Gemini 1.5 Flash and Pro versions (#2653, #2664, #2718, #2718)
  • Added Gemini 1.0 Pro 002 (#2664)
  • Added Command R and Command R+ models (#2548)
  • Fixed GPT4V Evaluator Out of Option Range Issue (#2677)
  • Added OLMo 1.5 (#2671)
  • Added RekaClient (#2675)
  • Added PaliGemma (#2683)
  • Added Mistral 7B Instruct v0.1, v0.2 and v0.3 (#2665)
  • Switched most Together chat models to use the chat client (#2703, #2701, #2705)
  • Added MedLM model (#2696, #2709)
  • Added Typhoon v1.5 models (#2659)
  • Changed HuggingFaceClient to truncate end of text token (#2643)
  • Added Qwen2 Instruct (72B) (#2722)
  • Added Yi Large (#2723, #1731)
  • Added Sailor models (#2658)
  • Added BioMistral and Meditron (#2728)

Frontend

Framework

  • Removed adapter_spec from schema_*.yaml files (#2611)
  • Added support for annotators / LLM-as-judge (#2622, #2700)
  • Updated documentation (#2626, #2529, #2521)

Evaluation Results

  • MMLU v1.2.0
    • Added results for DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (0409 preview), Mistral Small (2402), Mistral Large (2402), Arctic Instruct
  • MMLU v1.3.0
    • Added results for Gemini 1.5 Flash (0514 preview), GPT-4o (2024-05-13), Palmyra X V3 (72B)
  • MMLU v1.4.0
    • Added results for Yi Large (Preview), OLMo 1.7 (7B), Command R, Command R Plus, Gemini 1.5 Flash (001), Gemini 1.5 Pro (001), Mistral Instruct v0.3 (7B), GPT-4 Turbo (2024-04-09), Qwen1.5 Chat (110B), Qwen2 Instruct (72B)
  • Image2Struct v1.0.0
    • Initial release with Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Gemini 1.0 Pro Vision, Gemini 1.5 Pro (0409 preview),IDEFICS 2 (8B), IDEFICS-instruct (9B), IDEFICS-instruct (80B), LLaVA 1.5 (13B), LLaVA 1.6 (13B), GPT-4o (2024-05-13), GPT-4V (1106 preview), Qwen-VL Chat
  • AIR-Bench v1.0.0
    • Initial release with Claude 3 Haiku (20240307), Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Cohere Command R, Cohere Command R Plus, DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (001, default safety), Gemini 1.5 Flash (001, default safety), Llama 3 Instruct (8B), Llama 3 Instruct (70B), Yi Chat (34B), Mistral Instruct v0.3 (7B), Mixtral Instruct (8x7B), Mixtral Instruct (8x22B), GPT-3.5 Turbo (0613), GPT-3.5 Turbo (1106), GPT-3.5 Turbo (0125), GPT-4 Turbo (2024-04-09), GPT-4o (2024-05-13), Qwen1.5 Chat (72B)

Contributors

Thank you to the following contributors for your work on this HELM release!