v5.6.0
Highlights
This release is mainly a bunch of bug fixes:
- Pulling in breaks in upstream dependencies (e.g. Pydantic 2.10, aviary 0.10.1)
- Makes
GradablePaperQAEnvironment
's evaluations robust to an empty answer or multiple answers
Due to the introduction of Complete.NO_ANSWER_PHRASE
in #726 it was requested we consider this a minor version bump, as it will impact system performance.
What's Changed
- Fixed settings
session
intoEnvironmentState
, and suppressing PyMuPDF derivedDeprecationWarning
by @jamesbraza in #713 - Adding assertion
gather_evidence
doesn't populatesession.answer
by @jamesbraza in #716 - Lock file maintenance by @renovate in #715
- Fixes
gather_with_concurrency
typing by @maykcaldas in #714 - Latest tooling dependencies by @jamesbraza in #719
- Lock file maintenance by @renovate in #718
- Fixed
EVAL_PROMPT_TEMPLATE
to handle empty string or multiple match answers by @jamesbraza in #724 - Address missing
GenerateAnswer
in trajectories, no answers afterComplete
tools, and better history by @mskarlin in #726 - Pulling in latest
aviary
forconcurrency
rename by @jamesbraza in #728 - Pulling in latest
aviary
for dependencies fix, and retrying flakytest_propagate_options
more by @jamesbraza in #729 - Pulling in latest
ldp
forCallback.before_rollout
by @jamesbraza in #734 - Documenting why we don't handle evaluation failures in
GradablePaperQAEnvironment.step
by @jamesbraza in #738 - Created
LitQAEvaluation.calculate_accuracy_precision
utility by @jamesbraza in #733 - Refreshed test cassettes, fixed flaky test
test_search
, and fixed test type ignores by @jamesbraza in #739 - Unpins pydantic >2.10.2 requirement, removes TYPE_CHECKING by @nadolskit in #725
- Lock file maintenance by @renovate in #737
- Alternative maybe is text by @loesinghaus in #717
New Contributors
- @maykcaldas made their first contribution in #714
- @loesinghaus made their first contribution in #717
Full Changelog: v5.5.0...v5.6.0