Releases: deepset-ai/haystack
v1.14.0
⭐ Highlights
PromptNode enhancements
PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!
Shaper
We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.
IVF and Product Quantization support for OpenSearchDocumentStore
We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore
. You can train the IVF index by calling train_index
method (same as in FAISSDocumentStore
) or by setting ivf_train_size
when initializing OpenSearchDocumentStore
and take your search to the next level.
What's Changed
Breaking Changes
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
- feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
- feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
- build: cache nltk models into the docker image by @mayankjobanputra in #4118
- feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850
Pipeline
- feat: add frontmatter to meta in
MarkdownConverter
by @TuanaCelik in #3953 - fix: removing code block in
MarkdownConverter
by @TuanaCelik in #3960 - feat: Add page range support to PDF converters. by @danielbichuetti in #3965
- fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
- feat: add
Shaper
by @ZanSara in #3880 - fix: Event sending for
RayPipeline
crashing Haystack by @zoltan-fedor in #3971 - fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
- fix: make the crawler more robust on Windows by @anakin87 in #4049
- fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
- feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
- refactor: replace mutable default arguments by @julian-risch in #4070
- feat: Support multiple
RayPipelines
by @zoltan-fedor in #4078 - Remove double batching in retrieve_batch by @sjrl in #4014
- style: Update black by @silvanocerza in #4101
- fix: Fix
TableTextRetriever
for input consisting of tables only by @jackapbutler in #4048 - fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
- Docs: Fix code block formatting by @agnieszka-m in #4162
- refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
- fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
- feat: Add OpenAIError to retry mechanism by @sjrl in #4178
DocumentStores
- refactor: use weaviate client to build BM25 query by @hsm207 in #3939
- fix: fixed
InMemoryDocumentStore.get_embedding_count
to return correct number by @sjrl in #3980 - fix: Add inner query for mysql compatibility by @julian-risch in #4068
- feat: add support for custom headers by @hsm207 in #4040
- feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
- refactor:
InMemoryDocumentStore
- manage documents without embedding & fix mypy errors by @anakin87 in #4113 - refactor: complete the document stores test refactoring by @masci in #4125
- feat: include testing facilities into haystack package by @masci in #4182
Documentation
- Align with the docs install guide + correct lg by @agnieszka-m in #3950
- docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
- Docs: Update docstrings by @agnieszka-m in #4119
- docs: Update Annotation Tool README.md by @bogdankostic in #4123
- feat: Add model_kwargs option to PromptNode by @sjrl in #4151
- fix: Remove logging statement of setting ID manually in
Document
by @bogdankostic in #4129 - chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
- chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
- feat: Implement
run_batch
for PromptNode by @sjrl in #4072
Other Changes
- fix: add option to not override results by Shaper #4231
- fix: Shaper store all outputs from function #4223
- fix: allowing file-upload api to write files to disk #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
- feat: add top_k to PromptNode #4159
- feat: Add JsonConverter node #4130
- feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
- fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
- fix: change model in distillation test by @ZanSara in #3944
- feat: Expose
output_variable
in PromptNode result, adjust unit tests by @vblagoje in #3892 - fix: Fix type in
FARMReader
'ssave_to_remote
by @bogdankostic in #3952 - refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
- ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
- fix: overwrite params with environment variables even if there are no params in the pipeline definition; make
mypy
ignore REST API tests by @anakin87 in #3930 - Docs: Update ImageToText docstrings by @agnieszka-m in #3963
- Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
- ci: Add Docker images testing by @silvanocerza in #3943
- feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
- ci: Fix docker image testing on release by @silvanocerza in #3976
- Fix: Fix quotation marks by @agnieszka-m in #3973
- fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
- chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
- Missing import for
TransformersImageToText
by @ZanSara in #3984 - test: CI on py3.8 by @ZanSara in #3926
- Simplifies and fix docker images tests on release by @silvanocerza in #3982
- feat: Add
use_prefiltering
parameter toDeepsetCloudDocumentStore
by @bogdankostic in #3969 - ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
- fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
- fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
- fix: extend schema for prompt node results by @tstadel in #3891
- proposal: TableCell by @sjrl in #3875
- refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
- ci: Automate release on PyPi by @silvanocerza in https://github.co...
v1.14.0rc2
What's Changed
- fix: add option to not override results by Shaper #4231
- fix: Shaper store all outputs from function #4223
- fix: allowing file-upload api to write files to disk #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
- feat: add top_k to PromptNode #4159
- feat: Add JsonConverter node #4130
v1.14.0rc1
⭐ Highlights
PromptNode enhancements
PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!
Shaper
We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.
IVF and Product Quantization support for OpenSearchDocumentStore
We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore
. You can train the IVF index by calling train_index
method (same as in FAISSDocumentStore
) or by setting ivf_train_size
when initializing OpenSearchDocumentStore
and take your search to the next level.
What's Changed
Breaking Changes
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
- feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
- feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
- build: cache nltk models into the docker image by @mayankjobanputra in #4118
- feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850
Pipeline
- feat: add frontmatter to meta in
MarkdownConverter
by @TuanaCelik in #3953 - fix: removing code block in
MarkdownConverter
by @TuanaCelik in #3960 - feat: Add page range support to PDF converters. by @danielbichuetti in #3965
- fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
- feat: add
Shaper
by @ZanSara in #3880 - fix: Event sending for
RayPipeline
crashing Haystack by @zoltan-fedor in #3971 - fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
- fix: make the crawler more robust on Windows by @anakin87 in #4049
- fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
- feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
- refactor: replace mutable default arguments by @julian-risch in #4070
- feat: Support multiple
RayPipelines
by @zoltan-fedor in #4078 - Remove double batching in retrieve_batch by @sjrl in #4014
- style: Update black by @silvanocerza in #4101
- fix: Fix
TableTextRetriever
for input consisting of tables only by @jackapbutler in #4048 - fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
- Docs: Fix code block formatting by @agnieszka-m in #4162
- refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
- fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
- feat: Add OpenAIError to retry mechanism by @sjrl in #4178
DocumentStores
- refactor: use weaviate client to build BM25 query by @hsm207 in #3939
- fix: fixed
InMemoryDocumentStore.get_embedding_count
to return correct number by @sjrl in #3980 - fix: Add inner query for mysql compatibility by @julian-risch in #4068
- feat: add support for custom headers by @hsm207 in #4040
- feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
- refactor:
InMemoryDocumentStore
- manage documents without embedding & fix mypy errors by @anakin87 in #4113 - refactor: complete the document stores test refactoring by @masci in #4125
- feat: include testing facilities into haystack package by @masci in #4182
Documentation
- Align with the docs install guide + correct lg by @agnieszka-m in #3950
- docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
- Docs: Update docstrings by @agnieszka-m in #4119
- docs: Update Annotation Tool README.md by @bogdankostic in #4123
- feat: Add model_kwargs option to PromptNode by @sjrl in #4151
- fix: Remove logging statement of setting ID manually in
Document
by @bogdankostic in #4129 - chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
- chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
- Prompt node/run batch by @sjrl in #4072
Other Changes
- feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
- fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
- fix: change model in distillation test by @ZanSara in #3944
- feat: Expose
output_variable
in PromptNode result, adjust unit tests by @vblagoje in #3892 - fix: Fix type in
FARMReader
'ssave_to_remote
by @bogdankostic in #3952 - refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
- ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
- fix: overwrite params with environment variables even if there are no params in the pipeline definition; make
mypy
ignore REST API tests by @anakin87 in #3930 - Docs: Update ImageToText docstrings by @agnieszka-m in #3963
- Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
- ci: Add Docker images testing by @silvanocerza in #3943
- feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
- ci: Fix docker image testing on release by @silvanocerza in #3976
- Fix: Fix quotation marks by @agnieszka-m in #3973
- fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
- chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
- Missing import for
TransformersImageToText
by @ZanSara in #3984 - test: CI on py3.8 by @ZanSara in #3926
- Simplifies and fix docker images tests on release by @silvanocerza in #3982
- feat: Add
use_prefiltering
parameter toDeepsetCloudDocumentStore
by @bogdankostic in #3969 - ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
- fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
- fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
- fix: extend schema for prompt node results by @tstadel in #3891
- proposal: TableCell by @sjrl in #3875
- refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
- ci: Automate release on PyPi by @silvanocerza in #4015
- ci: Fix PyPi release workflow by @silvanocerza in #4029
- ci: Bump act10ns/slack from v1 to v2 by @silvanocerza in #4031
- ci: latest version of pylint is failing, ignore new errors by @masci in https://github.com/deep...
v1.13.2
What's Changed
Pipelines
- fix: fix torchaudio version by @mayankjobanputra in #4102
- feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore by @bogdankostic in #3969
Documentation
- Add shaper api by @agnieszka-m in #4082
- Update imgtotext api by @agnieszka-m in #4074
Full Changelog: v1.13.1...v1.13.2
v1.13.1
What's Changed
- fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
- Update pyproject.toml (#4035)
- feat: add
Shaper
(#3880) - fix: extend schema for prompt node results (#3891)
- fix: removing code block in
MarkdownConverter
(#3960) - feat: add frontmatter to meta in
MarkdownConverter
(#3953)
Full Changelog: v1.13.0...v1.13.1
v1.13.0
⭐ Highlights
Stop words for PromptNode
Implements stop words on the level of the PromptNode (for all models). Users can specify ‘stop_words’ as PromptNode list parameter, and thus stop LLM text generation once any of the stop words is encountered. Stop words will not be included in the response.
A dedicated Github repository for Haystack demo(s)
The source code for Haystack' Explore the World demo has been moved to a dedicated repository: https://github.com/deepset-ai/haystack-demos. Use this repository to check out the code, run it locally, fork, customize, and contribute!
New nodes: ImageToText
and CsvTextConverter
This release sees two new nodes, both contributed by community members!
The first one is ImageToText
(courtesy of our well-known @anakin87): an image captioning node that can generate description of image files and create Haystack documents from them.
The second one is CsvTextConverter
, from @Benvii: a small utility node that can load a CSV of FAQ question-answer pairs and correctly send them to your DocumentStore, making it super handy for FAQ matching pipelines.
Check out the docs to know more about them and try them out!
Faster tokenization for GPT models with tiktoken
Haystack now supports faster tokenization with OpenAI's tiktoken library, which can dramatically improve tokenization speed for GPT models. For unsupported architectures (Py < 3.8, arm64 and MacOS) fallbacks are in place and regular HuggingFace tokenizers are used. Thanks to @danielbichuetti for yet another amazing contribution!
What's Changed
Breaking Changes
- Migrating to use native Pytorch AMP by @sjrl in #2827
- bug: consistent batch_size parameter names in distillation by @julian-risch in #3811
- refactor: Move invocation_context from meta to own pipeline variable by @vblagoje in #3888
Pipeline
- feat: Update cohere embedding models by @vblagoje in #3704
- feat: add
index
parameter toTfidfRetriever
by @anakin87 in #3666 - feat: Use torch.inference_mode() for TableQA by @sjrl in #3731
- feat: Enable text-embedding-ada-002 for EmbeddingRetriever by @vblagoje in #3721
- refactor: improve monkey patch for
SklearnQueryClassifier
by @anakin87 in #3732 - refactor: remove unused code in
TfidfRetriever
by @anakin87 in #3733 - refactor: Remove duplicate code in TableReader by @sjrl in #3708
- fix: Make
InferenceProcessor
thread safe by @bogdankostic in #3709 - chore: adding template for prompt node by @TuanaCelik in #3738
- fix: Fixed local reader model loading by @mayankjobanputra in #3663
- fix: Fix
predict_batch
inTransformersReader
for single nested Document list by @bogdankostic in #3748 - feat: change PipelineConfigError to DocumentStoreError with more details by @julian-risch in #3783
- bug: skip empty documents in reader by @julian-risch in #3773
- fix: linefeeds in custom_query by @tstadel in #3813
- fix: Convert table cells to strings for compatibility with TableReader by @sjrl in #3762
- fix: Ensure eval mode for TableReader model for predictions by @sjrl in #3743
- fix: gracefully handle
FileExistsError
duringPreprocessor
resource download by @wochinge in #3816 - fix: make the crawler runnable and testable on Windows by @anakin87 in #3830
- fix: ignore non-serializable params when hashing pipeline objects by @masci in #3842
- feat: preprocessor raises warning when doc length exceeds threshold by @ZanSara in #3837
- fix: remove string validation in YAML by @ZanSara in #3854
- feat: Use truncate option for Cohere.embed by @sjrl in #3865
- feat:
ImageToText
(caption generator) by @anakin87 in #3859 - fix: Remove double super class init from ParsrConverter init by @silvanocerza in #3896
- feat: store
id_hash_keys
inDocument
objects to make documents clonable by @ZanSara in #3697 - feat: adding the ability to use Ray Serve async functionality by @zoltan-fedor in #3769
- feat: support cl100k_base tokenization and increase performance for GPT2 by @danielbichuetti in #3897
- fix: Fix number of concurrent requests in RequestLimiter by @bogdankostic in #3705
- feat: Run commands inside docker container as a non root user by @vblagoje in #3702
- fix: Removed overlooked torch scatter references by @sjrl in #3719
- build: upgrade torch and let transformers pick the version by @julian-risch in #3727
- feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate by @vblagoje in #3667
- refactor: remove deprecated parameters from
Summarizer
by @anakin87 in #3740 - refactor: Using
with open()
to read files by @sjrl in #3787 - feat: Bump python to 3.10 for gpu docker image, use nvidia/cuda by @vblagoje in #3701
- fix: pin protobuf version by @masci in #3789
- fix(docker): Use IMAGE_NAME in api image by @FabianHertwig in #3786
- bug: Fix launch_milvus() by cd'ing to milvus_dir by @t0r0id in #3795
- refactor: Change PromptNode registered templates from per class to per instance by @vblagoje in #3810
- bug: The
PromptNode
handles all parameters as lists without checking if they are in fact lists by @zoltan-fedor in #3820 - feat: update the docker image for haystack-api service by @bilgeyucel in #3835
- refactor: Simplify PromptTemplate substitution in PromptNode by @vblagoje in #3876
- feat: PromptNode - implement stop words by @vblagoje in #3884
- feat: Add retry with exponential back-off to PromptNode's OpenAI models by @vblagoje in #3886
- chore: Add timeouts to external requests calls by @silvanocerza in #3895
- feat: Add
CsvTextConverter
by @Benvii in #3587 - refactor: Improve stop_words handling, add unit test cases by @vblagoje in #3918
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict #3872
Models
- fix: adjust max token size for openai ADA-v2 embeddings by @LeoGitGuy in #3793
- feat: make new sklearn models default in QueryClassifier by @julian-risch in #3777
DocumentStores
- Fixing broken BM25 support with Weaviate - fixes #3720 by @zoltan-fedor in #3723
- feat: make
score_script
first class citizen viaknn_engine
param by @tstadel in #3284 - bug: skip validating empty embeddings by @julian-risch in #3774
- fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field by @tstadel in #3662
- fix: upgrade
launch_es()
to the version used in CI by @ZanSara in #3858 - Adding condition to
pinecone
object. by @AI-Ahmed in #3768 - fix: Allowing InMemStore and FAISSDocStore for indexing using single worker by @mayankjobanputra in #3868
- fix: authenticate with aws4auth if set in OpenSearchDocumentStore by @FabianHertwig in #3741
- Fixing the
query_batch
method of the deepsetcloud document store - … by @zoltan-fedor in #3724 - feat: add HA support for Weaviate by @zoltan-fedor in #3764
UI / Demo
...
v1.12.2
What's Changed
- Fixing the query_batch method of the deepsetcloud document store by @zoltan-fedor in #3724
- build: upgrade torch and let transformers pick the version by @julian-risch in #3727
- fix: Removed overlooked torch scatter references by @sjrl in #3719
Full Changelog: v1.12.1...v1.12.2
v1.12.2rc1
What's Changed
- Fixing the query_batch method of the deepsetcloud document store by @zoltan-fedor in #3724
- build: upgrade torch and let transformers pick the version by @julian-risch in #3727
- fix: Removed overlooked torch scatter references by @sjrl in #3719
Full Changelog: v1.12.1...v1.12.2rc1
v1.12.1
⭐ Highlights
Large Language Models with PromptNode
Introducing PromptNode
, a new feature that brings the power of large language models (LLMs) to various NLP tasks. PromptNode
is an easy-to-use, customizable node you can run on its own or in a pipeline. We've designed the API to be user-friendly and suitable for everyday experimentation, but also fully compatible with production-grade Haystack deployments.
By setting a prompt template for a PromptNode
you define what task you want it to do. This way, you can have multiple PromptNode
s in your pipeline, each performing a different task. But that's not all. You can also inject the output of one PromptNode
into the input of another one.
Out of the box, we support both Google T5 Flan and OpenAI GPT-3 models, and you can even mix and match these models in your pipelines.
from haystack.nodes.prompt import PromptNode
# Initialize the node:
prompt_node = PromptNode("google/flan-t5-base") # try also 'text-davinci-003' if you have an OpenAI key
prompt_node("What is the capital of Germany?")
This node can do a lot more than simply querying LLMs: they can manage prompt templates, run batches, share models among instances, be chained together in pipelines, and more. Check its documentation for details!
Support for BM25Retriever
in InMemoryDocumentStore
InMemoryDocumentStore
has always been the go-to document store for small prototypes. The addition of BM25 support makes it officially one of the document stores to support all Retrievers available to Haystack, just like FAISS and Elasticsearch-like stores, but without the external dependencies. Don't use it in your million-documents-throughput deployments to production, though. It's not the fastest document store out there.
🏆 Honorable mention to @anakin87 for this outstanding contribution, among many many others! 🏆
Haystack is always open to external contributions, and every little bit is appreciated. Don't know where to start? Have a look at the Contributors Guidelines.
Extended support for Cohere and OpenAI embeddings
We enabled EmbeddingRetriever
to use the latest Cohere multilingual embedding models and OpenAI embedding models.
Simply use the model's full name (along with your API key) in EmbeddingRetriever
to get them:
# Cohere
retriever = EmbeddingRetriever(embedding_model="multilingual-22-12", batch_size=16, api_key=api_key)
# OpenAI
retriever = EmbeddingRetriever(embedding_model="text-embedding-ada-002", batch_size=32, api_key=api_key, max_seq_len=8191)
Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)
Whenever you need to execute multiple dense searches at once, ElasticsearchDocumentStore
and OpenSearchDocumentStore
can now do it in parallel. This not only speeds up run_batch
and eval_batch
for dense pipelines when used with those document stores but also significantly speeds up multi-embedding retrieval pipelines like, for example, MostSimilarDocumentsPipeline
.
For this, we measured a speed up of up to 49% on a realistic dataset.
Under the hood, our newly introduced query_by_embedding_batch
document store function uses msearch
to unchain the full power of your Elasticsearch/OpenSearch cluster.
⚠️ Deprecated Docker images discontinued
1.12 is the last release we're shipping with the old Docker images deepset/haystack-cpu
, deepset/haystack-gpu
, and their relative tags. We'll remove the corresponding, deprecated Docker files /Dockerfile
, /Dockerfile-GPU
, and /Dockerfile-GPU-minimal
from the codebase after the release.
What's Changed
Pipeline
- fix:
ParsrConverter
fails on pages without text by @anakin87 in #3605 - fix: Convert eval metrics to python float by @tstadel in #3612
- feat: add support for
BM25Retriever
inInMemoryDocumentStore
by @anakin87 in #3561 - chore: fix return type of
aggregate_labels
by @tstadel in #3617 - refactor: change MultiModal retriever to be of type DenseRetriever by @mayankjobanputra in #3598
- fix: Move entire forward pass of TableQA within
torch.no_grad()
by @sjrl in #3636 - feat: add offsets_in_context to evaluation result by @julian-risch in #3640
- bug: Use tqdm auto instead of plain tqdm by @vblagoje in #3672
- fix: monkey patch for
SklearnQueryClassifier
by @anakin87 in #3678 - feat: Update table reader tests to check the answer scores by @sjrl in #3641
- feat: Adds all_terms_must_match parameter to BM25Retriever at runtime by @ugm2 in #3627
- fix: fix PreProcessor
split_by
schema by @ZanSara in #3680 - refactor: Generate JSON schema when missing by @masci in #3533
- refactor: replace
torch.no_grad
withtorch.inference_mode
(where possible) by @anakin87 in #3601 - Adjust get_type() method for pipelines by @vblagoje in #3657
- refactor: improve Multilabel design by @anakin87 in #3658
- feat: Update cohere embedding models #3704 by @vblagoje #3704
- feat: Enable
text-embedding-ada-002
forEmbeddingRetriever
#3721 by @vblagoje #3721 - feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate by @vblagoje in #3667
DocumentStores
- fix: Flatten
DocumentClassifier
output inSQLDocumentStore
by @anakin87 in #3273 - refactor: move milvus tests to their own module by @masci in #3596
- feat: store metadata using JSON in SQLDocumentStore by @masci in #3547
- fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603
- refactor: Move
InMemoryDocumentStore
tests to their own class by @masci in #3614 - chore: remove redundant tests by @masci in #3620
- refactor: Weaviate query with filters by @ZanSara in #3628
- fix: use 9200 as the default port in
launch_opensearch()
by @masci in #3630 - fix: revert Weaviate query with filters and improve tests by @ZanSara in #3646
- feat: add query_by_embedding_batch by @tstadel in #3546
- refactor: filters type by @tstadel in #3682
- fix: pinecone metadata format by @jamescalam in #3660
- fix: fixing broken BM25 support with Weaviate - fixes #3720 #3723 by @zoltan-fedor #3723
Documentation
- fix: fixing the url for document merger by @TuanaCelik in #3615
- docs: Reformat code blocks in docstrings by @brandenchan in #3580
Contributors to Tutorials
- fix: Tutorial 2, finetune a model, distillation code by Benvii deepset-ai/haystack-tutorials#69
- chore: Update 01_Basic_QA_Pipeline.ipynb by gsajko deepset-ai/haystack-tutorials#63
Other Changes
- test: add test to check id_hash_keys is not ignored by @julian-risch in #3577
- fix: remove
beir
fromall-gpu
by @ZanSara in #3669 - feat: Update DocumentMerger and TextIndexingPipeline imports by @brandenchan in #3599
- fix: pin
espnet
in theaudio
extra by @ZanSara in #3693 - refactor: update Squad data by @espoirMur in #3513
- Update CONTRIBUTING.md by @TuanaCelik in #3624
- fix: revamp
colab
extra dependencies by @masci in #3626 - refactor: remove
test
extra by @ZanSara in #3679 - fix: remove beir from the base GPU image by @ZanSara in #3692
- feat: Bump transformers version to remove torch scatter dependency by @sjrl in #3703
New Contributors
- @espoirMur made their first contribution in #3513
Full Changelog: v1.11.1...v1.12.1
v1.12.0
v1.12.0