Releases: Future-House/paper-qa
Releases · Future-House/paper-qa
v5.3.2
What's Changed
- Printing the
text
in a failedllm_parse_json
by @jamesbraza in #629 - Change S2 client logic to use arxiv doi if it's defined by @mskarlin in #632
- Increased retry count for
ClientConnectorDNSError
errors by @jamesbraza in #639 - Make string similarity case insensitive by default by @mskarlin in #640
- Pulling in latest
fhaviary
,mypy
,ruff
by @jamesbraza in #647 - Add an after model validator ensuring temp=1 for o1 models by @dakoner in #649
- Fixing crash due to
None
author by @jamesbraza in #650 - Fixing flaky test
test_minimal_fields_filtering
by @jamesbraza in #651 - Fixing flaky tests
test_code
andtest_minimal_fields_filtering
by @jamesbraza in #652 - Lock file maintenance by @renovate in #648
New Contributors
Full Changelog: v5.3.1...v5.3.2
v5.3.1
What's Changed
- Exposed
LDPRolloutCallback
andon_agent_action_callback
forfake
agent by @jamesbraza in #612 - Fixed
NumpyVectorStore.__eq__
'sNotImplemented
case by @jamesbraza in #613 - Implemented
__deepcopy__
on allEnvironment
s by @jamesbraza in #614 - Made embedding default by @whitead in #615
- Lock file maintenance by @renovate in #617
- Pulled in latest PyMuPDF for
set_messages
by @jamesbraza in #618 - Fixed crash due to DOI being a
list
by @jamesbraza in #619 - Added configuration to adjust how contexts are displayed by @whitead in #620
- Fixing CI by regenerating
test_pdf_reader_match_doc_details
cassette by @jamesbraza in #625 - Retrying flaky test
test_propagate_options
by @jamesbraza in #626 - Regression protection in
embedding_model_factory
by @jamesbraza in #622 - Added
writer.wait_merging_threads
call by @jamesbraza in #628 - Caching opened
tantivy.Index
es in the package by @jamesbraza in #627
Full Changelog: v5.3.0...v5.3.1
v5.3.0
What's Changed
- Add callback support in settings and tools by @nadolskit in #590
- Moved
validate_sources
to include allsources
not found by @jamesbraza in #595 - Cleaning up citations by @jamesbraza in #598
- Validating sources with case insensitive DOI matching by @jamesbraza in #600
- Added sentence transformers embedding model by @whitead in #604
- Cleaned and clarified deferred embeddings code by @jamesbraza in #597
- Fixed
NumpyVectorStore.clear
from not clearingtext_hashes
by @jamesbraza in #608 - Moved Pydantic
model_config
to top of each class by @jamesbraza in #607 - Testing we can use
~
aspaper_directory
by @jamesbraza in #610 - Increased testing of
NumpyVectorStore
anddefer_embeddings
by @jamesbraza in #606 - Expose env class to run_agent functions by @mskarlin in #611
Full Changelog: v5.2.1...v5.3.0
v5.2.1
What's Changed
- Fixing
Settings
propagation and always configuring LiteLLM retrying by @jamesbraza in #575 - Fixing the ability to resume index builds by @jamesbraza in #577
- Fixed
GLOBAL_RATE_LIMITER_TIMEOUT
env var typo by @jamesbraza in #578 - Lock file maintenance by @renovate in #583
- Unsilencing manifest read failure when no CSV header is present by @jamesbraza in #587
- Added
typeguard
to confirm type hints by @jamesbraza in #585 - Renovate once/month, removing stale LiteLLM disables by @jamesbraza in #589
- Fixed a few minor
SearchIndex
bugs and documentedSearchIndex
by @jamesbraza in #588 - Confirming LitQA sources are in a manifest/index by @jamesbraza in #579
- Limiting parsed page size to 1.28 million chars by @jamesbraza in #592
- Stripping DOI URL prefix for
sources
by @jamesbraza in #593 - Fix sequential client query clobbering by @mskarlin in #594
Full Changelog: v5.2.0...v5.2.1
v5.2.0: OpenAlex
Highlights
Added a new metadata provider OpenAlex for scholarly work, researchers, institutions, journals, and research topics.
- Responses can include open access information and raw pdf locations.
- Doesn't require authentication, but does prioritize requests with an email in the
mailto
URL parameter, exposed as an environment variableOPENALEX_MAILTO
Implemented an opt-in bypass around the litellm.Router
for LLM completions (see #563)
What's Changed
- Fixed pickle-ability of
LiteLLMModel
by @jamesbraza in #560 - Refactoring
LiteLLMModel
before removingRouter
by @jamesbraza in #561 - Adds openalex client as a default client by @nadolskit in #555
- Moving to
setup-uv
andhynek/build-and-inspect-python-package
in CI by @jamesbraza in #564 - Ability to bypass usage of
litellm.Router
by @jamesbraza in #563 - Propagating
hynek/build-and-inspect-python-package
's output location topypa/gh-action-pypi-publish
by @jamesbraza in #565 - Downloading
Package
artifact forpypa/gh-action-pypi-publish
by @jamesbraza in #566
Full Changelog: v5.1.1...v5.2.0
v5.1.1
What's Changed
- Lock file maintenance by @renovate in #545
- Validating for broken index by @jamesbraza in #544
- Added example how to use ollama hosted models by @grg-ffb in #536
- Making parsing resistant to failed inference of citations by @whitead in #551
- Exposed log verbosity configuration function by @jamesbraza in #552
- Cleaned up log verbosity code by @jamesbraza in #554
New Contributors
Full Changelog: v5.1.0...v5.1.1
v5.1.0: rate limits, refactored settings
Highlights
In-housed rate limits management
- Centers on a moving window algorithm with either a Redis or in-memory state
- Supports dynamically defined rates for different models or providers.
- New bundled configurations for different OpenAI rate limit tiers
- Accomplished using new third party dependencies
coredis
andlimits
Refactored Settings
to allow for increased flexibility
- Indexing
- Indexes can use relative paths, enabling sharing across machines
- Paper search now no longer rebuilds the index every invocation
- Index parameter now are grouped in
IndexSettings
- This release begins a deprecation cycle for the original hyperparameters
- Index builds now have a
rich.Progress
bar
- Parsing
- Chunking and embedding can now be deferred to inference time
- Agents
- Agents now have a
max_timesteps
parameter to upper-bound trajectory length - Default agent is now a simple tool calling agent (
ToolSelector
), instead of a deterministic sequence of tool calls ("fake" agent)
- Agents now have a
Several bug fixes centered on retry-able errors:
- Flaky Semantic Scholar and Crossref SSL errors and connection reset errors
- LLM completions and text embeddings
What's Changed
- Cleaning up #489's implementation by @jamesbraza in #503
- chore(deps): lock file maintenance by @renovate in #504
- chore(deps): lock file maintenance by @renovate in #506
- chore(deps): update all non-major dependencies by @renovate in #505
- Filtering two more
DeprecationWarning
s by @jamesbraza in #509 - Refactor to create
settings.agent.index
grouping by @jamesbraza in #510 - Removed extra
save_index
calls, and added missingchanged
by @jamesbraza in #513 - Not rebuilding
SearchIndex
everypaper_search
by @jamesbraza in #512 - Updated citation to arxiv preprint by @whitead in #514
- Aviary agent
max_timesteps
and fixedtest_gather_evidence_rejects_empty_docs
by @jamesbraza in #515 - Moved
reset_log_levels
tousefixtures
by @jamesbraza in #517 - Decomposing
Answer.could_not_answer
by @jamesbraza in #516 - Fixing
IndexSettings.use_absolute_paper_directory
leading to relative index file paths by @jamesbraza in #518 - Moving
run_ldp_agent
to center onRolloutManager
by @jamesbraza in #519 - Retrying on known Semantic Scholar flaky SSL error in
get_s2_doc_details_from_doi
by @jamesbraza in #522 - Converted
PyMuPDF
message to warning logs by @jamesbraza in #523 rich.Progress
bar for monitoring index builds by @jamesbraza in #521- Better descriptions and log messages by @jamesbraza in #524
- Made it possible to skip chunking by @whitead in #526
- Retrying on
aiohttp.ClientConnectionResetError
by @jamesbraza in #529 - Add rate limits for LLMs and Embedding Models by @mskarlin in #520
- Disallowing confusing
None
fromIndexSettings.index_directory
, andIndexSettings.get_named_index_directory
by @jamesbraza in #531 - Add router_kwargs in separate control flow step by @mskarlin in #532
- Propagating
AgentSettings.agent_type
default for synchrony by @jamesbraza in #533 - Adding retrying of
aembedding
if it fails by @jamesbraza in #535 - Add limits+coredis to mypy by @mskarlin in #537
- Lock file maintenance by @renovate in #534
- Controlling for
pymupdf
version intest_pdf_reader_match_doc_details
VCR by @jamesbraza in #538 - Lock file maintenance by @renovate in #539
- Fixed yet another
api.semanticscholar.org:443 ssl:default
error via retrying by @jamesbraza in #540
Full Changelog: v5.0.10...v5.1.0
v5.0.10
What's Changed
- Discovered Renovate
:automergeMinor
and preventingopenai
version bumps by @jamesbraza in #493 - Fixing
LitQATaskDataset
deserialization from config by @jamesbraza in #494 - chore(deps): update all non-major dependencies by @renovate in #498
- Broken reader ut by @nadolskit in #497
- Fixing
LitQATaskDatasetcompute_trajectory_metrics
crash with bad status extraction by @jamesbraza in #500 - For autogenerated
Router
kwargs, specifyingtimeout
of 60-sec by @jamesbraza in #501
Full Changelog: v5.0.9...v5.0.10
v5.0.9
What's Changed
- Fixing
tests/tests/cassettes
issue by using absolute path by @jamesbraza in #482 - Retrying on known Crossref flaky SSL error in
doi_to_bibtex
by @jamesbraza in #479 - Cleaning up and testing
get_directory_index
by @jamesbraza in #483 - Modernizing Renovate config by @jamesbraza in #487
- Allowing
parse_text
to be given astr
path by @jamesbraza in #491 - Refactor to expose
agents.RichHandler
by @jamesbraza in #489
Full Changelog: v5.0.8...v5.0.9
v5.0.8
What's Changed
- Documenting and cleaning up manifest file logic by @jamesbraza in #448
- Latest dependencies for
pylint
3.3 by @jamesbraza in #463 - Down-pinning
openai
1.47 since it breaks CI by @jamesbraza in #466 - Lock file maintenance by @renovate in #462
- chore: add .gitattributes for cassettes file by @devstein in #468
- Documenting Python 3.11+ in README by @jamesbraza in #467
- Fixing flaky
test_tool_failure
by @jamesbraza in #465 - Documenting manifest CSV pathing a bit more by @jamesbraza in #469
- Handling S2
KeyError
crash during indexing by @jamesbraza in #472 - Fixing
pymupdf.mupdf.FzErrorFormat
crash by recasting as anImpossibleParsingError
by @jamesbraza in #474 - Updating
test_tool_failure
cassette by @jamesbraza in #476 - Simplifying the indexing of
action
tokens by @jamesbraza in #477 - Truncating failing
test_evaluation
viamax_rollout_steps
by @jamesbraza in #475
New Contributors
Full Changelog: v5.0.7...v5.0.8