19 Dec 09:40

ZanSara

c62b5fb

v1.12.0rc1 Pre-release

Pre-release

⭐ Highlights

Large Language Models with `PromptNode`

Introducing PromptNode, a new feature that brings the power of large language models (LLMs) to various NLP tasks. PromptNode is an easy-to-use, customizable node you can run on its own or in a pipeline. We've designed the API to be user-friendly and suitable for everyday experimentation, but also fully compatible with production-grade Haystack deployments.

By setting a prompt template for a PromptNode you define what task you want it to do. This way, you can have multiple PromptNodes in your pipeline, each performing a different task. But that's not all. You can also inject the output of one PromptNode into the input of another one.

Out of the box, we support both Google T5 Flan and OpenAI GPT-3 models, and you can even mix and match these models in your pipelines.

from haystack.nodes.prompt import PromptNode

# Initialize the node:
prompt_node = PromptNode("google/flan-t5-base")  # try also 'text-davinci-003' if you have an OpenAI key

prompt_node("What is the capital of Germany?")

This node can do a lot more than simply querying LLMs: they can manage prompt templates, run batches, share models among instances, be chained together in pipelines, and more. Check its documentation for details!

Support for `BM25Retriever` in `InMemoryDocumentStore`

InMemoryDocumentStore has always been the go-to document store for small prototypes. The addition of BM25 support makes it officially one of the document stores to support all Retrievers available to Haystack, just like FAISS and Elasticsearch-like stores, but without the external dependencies. Don't use it in your million-documents-throughput deployments to production, though. It's not the fastest document store out there.

🏆 Honorable mention to @anakin87 for this outstanding contribution, among many many others! 🏆

Haystack is always open to external contributions, and every little bit is appreciated. Don't know where to start? Have a look at the Contributors Guidelines.

Extended support for Cohere and OpenAI embeddings

We enabled EmbeddingRetriever to use the latest Cohere multilingual embedding models and OpenAI embedding models.

Simply use the model's full name (along with your API key) in EmbeddingRetriever to get them:

# Cohere
retriever = EmbeddingRetriever(embedding_model="multilingual-22-12", batch_size=16, api_key=api_key)
# OpenAI
retriever = EmbeddingRetriever(embedding_model="text-embedding-ada-002", batch_size=32, api_key=api_key, max_seq_len=8191)

Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)

Whenever you need to execute multiple dense searches at once, ElasticsearchDocumentStore and OpenSearchDocumentStore can now do it in parallel. This not only speeds up run_batch and eval_batch for dense pipelines when used with those document stores but also significantly speeds up multi-embedding retrieval pipelines like, for example, MostSimilarDocumentsPipeline.

For this, we measured a speed up of up to 49% on a realistic dataset.

Under the hood, our newly introduced query_by_embedding_batch document store function uses msearch to unchain the full power of your Elasticsearch/OpenSearch cluster.

⚠️ Deprecated Docker images discontinued

1.12 is the last release we're shipping with the old Docker images deepset/haystack-cpu, deepset/haystack-gpu, and their relative tags. We'll remove the corresponding, deprecated Docker files /Dockerfile, /Dockerfile-GPU, and /Dockerfile-GPU-minimal from the codebase after the release.

What's Changed

Pipeline

fix: ParsrConverter fails on pages without text by @anakin87 in #3605
fix: Convert eval metrics to python float by @tstadel in #3612
feat: add support for BM25Retriever in InMemoryDocumentStore by @anakin87 in #3561
chore: fix return type of aggregate_labels by @tstadel in #3617
refactor: change MultiModal retriever to be of type DenseRetriever by @mayankjobanputra in #3598
fix: Move entire forward pass of TableQA within torch.no_grad() by @sjrl in #3636
feat: add offsets_in_context to evaluation result by @julian-risch in #3640
bug: Use tqdm auto instead of plain tqdm by @vblagoje in #3672
fix: monkey patch for SklearnQueryClassifier by @anakin87 in #3678
feat: Update table reader tests to check the answer scores by @sjrl in #3641
feat: Adds all_terms_must_match parameter to BM25Retriever at runtime by @ugm2 in #3627
fix: fix PreProcessor split_by schema by @ZanSara in #3680
refactor: Generate JSON schema when missing by @masci in #3533
refactor: replace torch.no_grad with torch.inference_mode (where possible) by @anakin87 in #3601
Adjust get_type() method for pipelines by @vblagoje in #3657
refactor: improve Multilabel design by @anakin87 in #3658
feat: Update cohere embedding models #3704 by @vblagoje #3704
feat: Enable text-embedding-ada-002 for EmbeddingRetriever #3721 by @vblagoje #3721

DocumentStores

fix: Flatten DocumentClassifier output in SQLDocumentStore by @anakin87 in #3273
refactor: move milvus tests to their own module by @masci in #3596
feat: store metadata using JSON in SQLDocumentStore by @masci in #3547
fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603
refactor: Move InMemoryDocumentStore tests to their own class by @masci in #3614
chore: remove redundant tests by @masci in #3620
refactor: Weaviate query with filters by @ZanSara in #3628
fix: use 9200 as the default port in launch_opensearch() by @masci in #3630
fix: revert Weaviate query with filters and improve tests by @ZanSara in #3646
feat: add query_by_embedding_batch by @tstadel in #3546
refactor: filters type by @tstadel in #3682
fix: pinecone metadata format by @jamescalam in #3660
fix: fixing broken BM25 support with Weaviate - fixes #3720 #3723 by @zoltan-fedor #3723

Documentation

fix: fixing the url for document merger by @TuanaCelik in #3615
docs: Reformat code blocks in docstrings by @brandenchan in #3580

Contributors to Tutorials

fix: Tutorial 2, finetune a model, distillation code by Benvii deepset-ai/haystack-tutorials#69
chore: Update 01_Basic_QA_Pipeline.ipynb by gsajko deepset-ai/haystack-tutorials#63

Other Changes

test: add test to check id_hash_keys is not ignored by @julian-risch in #3577
fix: remove beir from all-gpu by @ZanSara in #3669
feat: Update DocumentMerger and TextIndexingPipeline imports by @brandenchan in #3599
fix: pin espnet in the audio extra by @ZanSara in #3693
refactor: update Squad data by @espoirMur in #3513
Update CONTRIBUTING.md by @TuanaCelik in #3624
fix: revamp colab extra dependencies by @masci in #3626
refactor: remove test extra by @ZanSara in #3679
fix: remove beir from the base GPU image by @ZanSara in #3692
feat: Bump transformers version to remove torch scatter dependency by @sjrl in #3703

New Contributors

@espoirMur made their first contribution in #3513

Full Changelog: v1.11.1...v1.12.0rc1

Contributors

masci, vblagoje, and 12 other contributors

Assets 2

06 Dec 18:11

bogdankostic

v1.11.1

05ea711

v1.11.1

What's Changed

fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603

Full Changelog: v1.11.0...v1.11.1

Contributors

masci

Assets 2

06 Dec 16:15

bogdankostic

v1.11.1rc1

bf27a0c

v1.11.1rc1 Pre-release

Pre-release

What's Changed

fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603

Full Changelog: v1.11.0...v1.11.1rc1

Contributors

masci

Assets 2

21 Nov 11:22

masci

v1.11.0

0b106f6

v1.11.0

⭐ Highlights

Expanding Haystack’s LLM support further with the new `CohereEmbeddingEncoder` (#3356)

Now you can easily create document and query embeddings using Cohere’s large language models: if you have a Cohere account, all you have to do is set the name of one of the supported models (small, medium, or large) and add your API key to the EmbeddingRetriever component in your pipelines (see docs).

Extracting headlines from Markdown and PDF files (#3445 #3488)

Using the MarkdownConverter or the ParsrConverter you can set the parameter extract_headlines to True to extract the headlines out of your files together with their start start position in the file and their level. Headlines are stored as a list of dictionaries in the Document's meta field "headlines" and are structured as followed:

{
    "headline": <THE HEADLINE STRING>,
    "start_idx": <IDX OF HEADLINE START IN document.content >,
    "level": <LEVEL OF THE HEADLINE>
}

Introducing the proposals design process (#3333)

We've introduced the proposal design process for substantial changes. A proposal is a single Markdown file that explains why a change is needed and how it would be implemented. You can find a detailed explanation of the process and a proposal template in the proposals directory.

⚠️ Breaking change: removing `Milvus1DocumentStore`

From this version onwards, Haystack no longer supports version 1 of Milvus. We still support Milvus version 2. We removed Milvus1DocumentStore and renamed Milvus2DocumentStore to MilvusDocumentStore.

What's Changed

Breaking Changes

bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings workflow by @mayankjobanputra in #3368
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x by @masci in #3552

Pipeline

fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
bug: change type of split_by to Literal including None by @julian-risch in #3389
Fix: update pyworld pin by @anakin87 in #3435
feat: send event if number of queries exceeds threshold by @vblagoje in #3419
Feat: allow decreasing size of datasets loaded from BEIR by @ugm2 in #3392
feat: add __cointains__ to Span by @ZanSara in #3446
Bug: Fix prompt length computation by @Timoeller in #3448
Add indexing pipeline type by @vblagoje in #3461
fix: warning if doc store similarity function is incompatible with Sentence Transformers model by @anakin87 in #3455
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3453
feat: Extraction of headlines in markdown files by @bogdankostic in #3445
bug: replace decorator with counter attribute for pipeline event by @julian-risch in #3462
feat: add document_store to all BaseRetriever.retrieve() and BaseRetriever.retrieve_batch() implementations by @ZanSara in #3379
refactor: TableReader by @sjrl in #3456
fix: do not reference package directory in PDFToTextOCRConverter.convert() by @ZanSara in #3478
feat: Create the TextIndexingPipeline by @brandenchan in #3473
refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline by @ZanSara in #3443
fix: strip whitespaces safely from FARMReader's answers by @ZanSara in #3526

DocumentStores

Document Store test refactoring by @masci in #3449
fix: support long texts for labels in ElasticsearchDocumentStore by @anakin87 in #3346
feat: add SQLDocumentStore tests by @masci in #3517
refactor: Refactor Weaviate tests by @masci in #3541
refactor: Pinecone tests by @masci in #3555
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" by @anakin87 in #3548
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta by @tstadel in #3572
fix: discard metadata fields if not set in Weaviate by @masci in #3578

UI / Demo

refactor: update package strategy in ui by @anakin87 in #3396

Documentation

docs: Extend utils API docs coverage by @brandenchan in #3402
refactor: simplify Summarizer, add Document Merger by @anakin87 in #3452
feat: introduce proposal design process by @masci in #3333

Other Changes

fix: Update env variable for model caching timeout by @sjrl in #3405
feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398
fix: improve Document __repr__ by @anakin87 in #3385
fix: disabling telemetry prevents writing config by @julian-risch in #3465
refactor: Change no_answer attribute by @anakin87 in #3411
feat: Speed up reader tests by @sjrl in #3476
fix: pattern to match tags push by @masci in #3469
fix: using onnx converter on XLMRoberta architecture by @sjrl in #3470
feat: Add headline extraction to ParsrConverter by @bogdankostic in #3488
refactor: upgrade actions version by @ZanSara in #3506
docs: Update docker readme by @brandenchan in #3531
refactor: refactor FAISS tests by @masci in #3537
feat: include error message in HaystackError telemetry events by @vblagoje in #3543
fix: [rest_api] support TableQA in the endpoint /documents/get_by_filters by @ju-gu in #3551
bug: fix release number by @mayankjobanputra in #3559
refactor: Generate JSON schema when missing by @masci in #3533

New Contributors

@brunnurs made their first contribution in #3330
@mayankjobanputra made their first contribution in #3368

Full Changelog: v1.10.0...v1.11.0rc1

Contributors

masci, vblagoje, and 12 other contributors

Assets 2

18 Nov 07:38

bogdankostic

v1.11.0rc1

893d2d4

v1.11.0rc1 Pre-release

Pre-release

⭐ Highlights

Expanding Haystack’s LLM support further with the new `CohereEmbeddingEncoder` (#3356)

Extracting headlines from Markdown and PDF files (#3445 #3488)

{
    "headline": <THE HEADLINE STRING>,
    "start_idx": <IDX OF HEADLINE START IN document.content >,
    "level": <LEVEL OF THE HEADLINE>
}

Introducing the proposals design process (#3333)

⚠️ Breaking change: removing `Milvus1DocumentStore`

What's Changed

Breaking Changes

bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings workflow by @mayankjobanputra in #3368
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x by @masci in #3552

Pipeline

fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
bug: change type of split_by to Literal including None by @julian-risch in #3389
Fix: update pyworld pin by @anakin87 in #3435
feat: send event if number of queries exceeds threshold by @vblagoje in #3419
Feat: allow decreasing size of datasets loaded from BEIR by @ugm2 in #3392
feat: add __cointains__ to Span by @ZanSara in #3446
Bug: Fix prompt length computation by @Timoeller in #3448
Add indexing pipeline type by @vblagoje in #3461
fix: warning if doc store similarity function is incompatible with Sentence Transformers model by @anakin87 in #3455
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3453
feat: Extraction of headlines in markdown files by @bogdankostic in #3445
bug: replace decorator with counter attribute for pipeline event by @julian-risch in #3462
feat: add document_store to all BaseRetriever.retrieve() and BaseRetriever.retrieve_batch() implementations by @ZanSara in #3379
refactor: TableReader by @sjrl in #3456
fix: do not reference package directory in PDFToTextOCRConverter.convert() by @ZanSara in #3478
feat: Create the TextIndexingPipeline by @brandenchan in #3473
refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline by @ZanSara in #3443
fix: strip whitespaces safely from FARMReader's answers by @ZanSara in #3526

DocumentStores

Document Store test refactoring by @masci in #3449
fix: support long texts for labels in ElasticsearchDocumentStore by @anakin87 in #3346
feat: add SQLDocumentStore tests by @masci in #3517
refactor: Refactor Weaviate tests by @masci in #3541
refactor: Pinecone tests by @masci in #3555
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" by @anakin87 in #3548
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta by @tstadel in #3572
fix: discard metadata fields if not set in Weaviate by @masci in #3578

UI / Demo

refactor: update package strategy in ui by @anakin87 in #3396

Documentation

docs: Extend utils API docs coverage by @brandenchan in #3402
refactor: simplify Summarizer, add Document Merger by @anakin87 in #3452
feat: introduce proposal design process by @masci in #3333

Other Changes

fix: Update env variable for model caching timeout by @sjrl in #3405
feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398
fix: improve Document __repr__ by @anakin87 in #3385
fix: disabling telemetry prevents writing config by @julian-risch in #3465
refactor: Change no_answer attribute by @anakin87 in #3411
feat: Speed up reader tests by @sjrl in #3476
fix: pattern to match tags push by @masci in #3469
fix: using onnx converter on XLMRoberta architecture by @sjrl in #3470
feat: Add headline extraction to ParsrConverter by @bogdankostic in #3488
refactor: upgrade actions version by @ZanSara in #3506
docs: Update docker readme by @brandenchan in #3531
refactor: refactor FAISS tests by @masci in #3537
feat: include error message in HaystackError telemetry events by @vblagoje in #3543
fix: [rest_api] support TableQA in the endpoint /documents/get_by_filters by @ju-gu in #3551
bug: fix release number by @mayankjobanputra in #3559
refactor: Generate JSON schema when missing by @masci in #3533

New Contributors

@brunnurs made their first contribution in #3330
@mayankjobanputra made their first contribution in #3368

Full Changelog: v1.10.0...v1.11.0rc1

Contributors

masci, vblagoje, and 12 other contributors

Assets 2

25 Oct 13:47

masci

v1.10.0

3a2714e

v1.10.0

⭐ Highlights

Expanding Haystack's LLM support with the new `OpenAIEmbeddingEncoder` (#3356)

Now you can easily create document and query embeddings using large language models: if you have an OpenAI account, all you have to do is set the name of one of the supported models (ada, babbage, davinci or curie) and add your API key to the EmbeddingRetriever component in your pipelines (see docs).

Multimodal retrieval is here! (#2891)

Multimodality with Haystack just made a big leap forward with the addition of MultiModalRetriever: a Retriever that can handle different modalities for query and documents independently. Take it for a spin and experiment with new Document formats, like images. You can now use the same Retriever for text-to-image, text-to-table, and text-to-text retrieval but also image similarity, table similarity, and more! Feed your favorite multimodal model to MultiModalRetriever and see it in action.

retriever = MultiModalRetriever(
    document_store=InMemoryDocumentStore(embedding_dim=512),
    query_embedding_model = "sentence-transformers/clip-ViT-B-32",
    query_type="text",
    document_embedding_models = {"image": "sentence-transformers/clip-ViT-B-32"}
)

Multi-platform Docker images

Starting with 1.10, we're making the deepset/haystack images available for linux/amd64 and linux/arm64.

⚠️ Breaking change in `embed_queries` method (#3252)

We've changed the text argument in the embed_queries method for DensePassageRetriever and EmbeddingRetriever to queries.

What's Changed

Breaking Changes

chore: add DenseRetriever abstraction by @tstadel in #3252

Pipeline

fix: ONNX FARMReader model conversion is broken by @vblagoje in #3211
bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node by @JeffRisberg in #3170
fix: eval() with add_isolated_node_eval=True breaks if no node supports it by @tstadel in #3347
feat: extract label aggregation by @tstadel in #3363
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3356
fix: stable YAML schema generation by @ZanSara in #3388
fix: Update how schema is ordered by @sjrl in #3399
feat: MultiModalRetriever by @ZanSara in #2891

DocumentStores

feat: FAISS in OpenSearch: Support HNSW for cosine by @tstadel in #3217
feat: add support for Elasticsearch 7.16.2 by @masci in #3318
refactor: remove dead code from FAISSDocumentStore by @anakin87 in #3372
fix: allow same vector_id in different indexes for SQL-based Document stores by @anakin87 in #3383

UI / Demo

fix: demo won't start through Docker compose on Apple M1 by @masci in #3337

Documentation

docs: Fix a docstring in ray.py by @tanertopal in #3282

Other Changes

refactor: make TransformersDocumentClassifier output consistent between different types of classification by @anakin87 in #3224
Classify pipeline's type based on its components by @vblagoje in #3132
docs: sync Haystack API with Readme by @brandenchan in #3223
fix: MostSimilarDocumentsPipeline doesn't have pipeline property by @vblagoje in #3265
bug: make ElasticSearchDocumentStore use batch_size in get_documents_by_id by @anakin87 in #3166
refactor: better tests for TransformersDocumentClassifier by @anakin87 in #3270
fix: AttributeError in TranslationWrapperPipeline by @nickchomey in #3290
refactor: remove Inferencer multiprocessing by @vblagoje in #3283
fix: opensearch script score with filters by @tstadel in #3321
feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch by @JacdDev in #3301
feat: add multi-platform Docker images by @masci in #3354
fix: Added checks for DataParallel and WrappedDataParallel by @sjrl in #3366
fix: QuestionGenerator generates wrong document questions for non-default num_queries_per_doc parameter by @vblagoje in #3381
bug: Adds better way of checking query in BaseRetriever and Pipeline.run() by @ugm2 in #3304
feat: Updated EntityExtractor to handle long texts and added better postprocessing by @sjrl in #3154
docs: Add comment about the generation of no-answer samples in FARMReader training by @brandenchan in #3404
feat: Speed up integration tests (nodes) by @sjrl in #3408
fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
bug: change type of split_by to Literal including None by @julian-risch in #3389
feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398

New Contributors

@tanertopal made their first contribution in #3282
@JeffRisberg made their first contribution in #3170
@JacdDev made their first contribution in #3301
@hsm207 made their first contribution in #3351
@ugm2 made their first contribution in #3304
@brunnurs made their first contribution in #3330

Full Changelog: v1.9.1...v1.10.0rc1

Contributors

masci, tanertopal, and 13 other contributors

Assets 2

20 Oct 15:37

masci

v1.10.0rc1

0405d70

v1.10.0rc1 Pre-release

Pre-release

⭐ Highlights

Expanding Haystack's LLM support with the new `OpenAIEmbeddingEncoder` (#3356)

Multimodal retrieval is here! (#2891)

retriever = MultiModalRetriever(
    document_store=InMemoryDocumentStore(embedding_dim=512),
    query_embedding_model = "sentence-transformers/clip-ViT-B-32",
    query_type="text",
    document_embedding_models = {"image": "sentence-transformers/clip-ViT-B-32"}
)

Multi-platform Docker images

Starting with 1.10, we're making the deepset/haystack images available for linux/amd64 and linux/arm64.

⚠️ Breaking change in `embed_queries` method (#3252)

We've changed the text argument in the embed_queries method for DensePassageRetriever and EmbeddingRetriever to queries.

What's Changed

Breaking Changes

chore: add DenseRetriever abstraction by @tstadel in #3252

Pipeline

fix: ONNX FARMReader model conversion is broken by @vblagoje in #3211
bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node by @JeffRisberg in #3170
fix: eval() with add_isolated_node_eval=True breaks if no node supports it by @tstadel in #3347
feat: extract label aggregation by @tstadel in #3363
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3356
fix: stable YAML schema generation by @ZanSara in #3388
fix: Update how schema is ordered by @sjrl in #3399
feat: MultiModalRetriever by @ZanSara in #2891

DocumentStores

feat: FAISS in OpenSearch: Support HNSW for cosine by @tstadel in #3217
feat: add support for Elasticsearch 7.16.2 by @masci in #3318
refactor: remove dead code from FAISSDocumentStore by @anakin87 in #3372
fix: allow same vector_id in different indexes for SQL-based Document stores by @anakin87 in #3383

UI / Demo

fix: demo won't start through Docker compose on Apple M1 by @masci in #3337

Documentation

docs: Fix a docstring in ray.py by @tanertopal in #3282

Other Changes

refactor: make TransformersDocumentClassifier output consistent between different types of classification by @anakin87 in #3224
Classify pipeline's type based on its components by @vblagoje in #3132
docs: sync Haystack API with Readme by @brandenchan in #3223
fix: MostSimilarDocumentsPipeline doesn't have pipeline property by @vblagoje in #3265
bug: make ElasticSearchDocumentStore use batch_size in get_documents_by_id by @anakin87 in #3166
refactor: better tests for TransformersDocumentClassifier by @anakin87 in #3270
fix: AttributeError in TranslationWrapperPipeline by @nickchomey in #3290
refactor: remove Inferencer multiprocessing by @vblagoje in #3283
fix: opensearch script score with filters by @tstadel in #3321
feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch by @JacdDev in #3301
feat: add multi-platform Docker images by @masci in #3354
fix: Added checks for DataParallel and WrappedDataParallel by @sjrl in #3366
fix: QuestionGenerator generates wrong document questions for non-default num_queries_per_doc parameter by @vblagoje in #3381
bug: Adds better way of checking query in BaseRetriever and Pipeline.run() by @ugm2 in #3304
feat: Updated EntityExtractor to handle long texts and added better postprocessing by @sjrl in #3154
docs: Add comment about the generation of no-answer samples in FARMReader training by @brandenchan in #3404
feat: Speed up integration tests (nodes) by @sjrl in #3408
fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
bug: change type of split_by to Literal including None by @julian-risch in #3389
feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398

New Contributors

@tanertopal made their first contribution in #3282
@JeffRisberg made their first contribution in #3170
@JacdDev made their first contribution in #3301
@hsm207 made their first contribution in #3351
@ugm2 made their first contribution in #3304
@brunnurs made their first contribution in #3330

Full Changelog: v1.9.1...v1.10.0rc1

Contributors

masci, tanertopal, and 13 other contributors

Assets 2

10 Oct 13:16

masci

v1.9.1

c1e8d14

v1.9.1

What's Changed

fix: Allow less restrictive values for parameters in Pipeline configurations by @bogdankostic in #3345

Full Changelog: v1.9.0...v1.9.1rc1

Contributors

bogdankostic

Assets 2

10 Oct 12:37

masci

v1.9.1rc1

256321d

v1.9.1rc1 Pre-release

Pre-release

What's Changed

fix: Allow less restrictive values for parameters in Pipeline configurations by @bogdankostic in #3345

Full Changelog: v1.9.0...v1.9.1rc1

Contributors

bogdankostic

Assets 2

21 Sep 11:23

masci

v1.9.0

ce36be8

v1.9.0

⭐ Highlights

Haystack 1.9 comes with nice performance improvements and two important pieces of news about its ecosystem. Let's see it in more detail!

Logging speed set to ludicrous (#3212)

This feature alone makes Haystack 1.9 worth testing out, just sayin'... We switched from f-strings to the string formatting operator when composing a log message, observing an astonishing speed of up to 120% in some pipelines.

Tutorials moved out! (#3244)

They grow up so fast! Tutorials now have their own git repository, CI, and release cycle, making it easier than ever to contribute ideas, fixes, and bug reports. Have a look at the tutorials repo, Star it, and open an issue if you have an idea for a new tutorial!

Docker pull deepset/haystack (#3162)

A new Docker image is ready to be pulled shipping Haystack 1.9, providing different flavors and versions that you can specify with the proper Docker tag - have a look at the README.
On this occasion, we also revamped the build process so that it's now using bake, while the older images are deprecated (see below).

⚠️ Deprecation notice

With the release of the new Docker image deepset/haystack, the following images are now deprecated and won't be updated any more starting with Haystack 1.10:

New Documentation Site and Haystack Website Revamp:

The Haystack website is going through a make-over to become a developer portal that surrounds Haystack and NLP topics beyond pure documentation. With that, we've published our new documentation site. From now on, content surrounding pure developer documentation will live under Haystack Documentation, while the Haystack website becomes a place for the community with tutorials, learning material and soon, a place where the community can share their own content too.

What's Changed

Pipeline

feat: standardize devices parameter and device initialization by @vblagoje in #3062
fix: Reduce GPU to CPU copies at inference by @sjrl in #3127
test: lower low boundary for accuracy in test_calculate_context_similarity_on_non_matching_contexts by @ZanSara in #3199
bug: fix pdftotext installation verification by @banjocustard in #3233
chore: remove f-strings from logs for performance reasons by @ZanSara in #3212
bug: reactivate benchmarks with quick fixes by @tholor in #2766

Models

fix: Replace multiprocessing tokenization with batched fast tokenization by @vblagoje in #3089

DocumentStores

bug: OpensearchDocumentStore.custom_mapping should accept JSON strings at validation by @ZanSara in #3065
feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents by @Namoush in #3086
bug: validate custom_mapping as an object by @ZanSara in #3189

Tutorials

docs: Fix the word length splitting; should be set to 100 not 1,000 by @stevenhaley in #3133
chore: remove tutorials from the repo by @masci in #3244

Other Changes

chore: Upgrade and pin transformers to 4.21.2 by @vblagoje in #3098
bug: adapt UI random question for streamlit 1.12 and pin to streamlit>=1.9.0 by @anakin87 in #3121
build: pin pydantic to 1.9.2 by @masci in #3126
fix: document FARMReader.train() evaluation report log level by @brandenchan in #3129
feat: add a security policy for Haystack by @masci in #3130
refactor: update dependencies and remove pins by @danielbichuetti in #3147
refactor: update package strategy in rest_api by @masci in #3148
fix: give default index for torch.device('cuda') in initialize_device_settings by @sjrl in #3161
fix: add type hints to all component init constructor parameters by @vblagoje in #3152
fix: Add 15 min timeout for downloading cached HF models by @vblagoje in #3179
fix: replace torch.device("cuda") with torch.device("cuda:0") in devices initialization by @vblagoje in #3184
feat: add health check endpoint to rest api by @danielbichuetti in #3168
refactor: improve support for dataclasses by @danielbichuetti in #3142
feat: Updates docs and types for language param in PreProcessor by @sjrl in #3186
feat: Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers by @bglearning in #3164
refactoring: reimplement Docker strategy by @masci in #3162
refactor: remove pre haystack-1.0 import paths support by @ZanSara in #3204
feat: exponential backoff with exp decreasing batch size for opensearch and elasticsearch client by @ArzelaAscoIi in #3194
feat: add public layout-base extraction support on PDFToTextConverter by @danielbichuetti in #3137
bug: fix embedding_dim mismatch in DocumentStore by @kalki7 in #3183
fix: update rest_api Docker Compose yamls for recent refactoring of rest_api by @nickchomey in #3197
chore: fix Windows CI by @masci in #3222
fix: type of temperature param and adjust defaults for OpenAIAnswerGenerator by @tholor in #3073
fix: handle Documents containing dataframes in Multilabel constructor by @masci in #3237
fix: make pydoc-markdown hook correctly resolve paths relative to repo root by @masci in #3238
fix: proper retrieval of answers for batch eval by @vblagoje in #3245
chore: updating colab links in older docs versions by @TuanaCelik in #3250
docs: establish API docs sync between v1.9.x and Readme by @brandenchan in #3266

New Contributors

@Namoush made their first contribution in #3086
@kalki7 made their first contribution in #3183
@nickchomey made their first contribution in #3197
@banjocustard made their first contribution in #3233

Full Changelog: v1.8.0...v1.9.0

Contributors

masci, vblagoje, and 14 other contributors

Assets 2

Releases: deepset-ai/haystack

v1.12.0rc1

⭐ Highlights

Large Language Models with PromptNode

Support for BM25Retriever in InMemoryDocumentStore

Extended support for Cohere and OpenAI embeddings

Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)

⚠️ Deprecated Docker images discontinued

What's Changed

Pipeline

DocumentStores

Documentation

Contributors to Tutorials

Other Changes

New Contributors

Contributors

v1.11.1

What's Changed

Contributors

v1.11.1rc1

What's Changed

Contributors

v1.11.0

⭐ Highlights

Expanding Haystack’s LLM support further with the new CohereEmbeddingEncoder (#3356)

Extracting headlines from Markdown and PDF files (#3445 #3488)

Introducing the proposals design process (#3333)

⚠️ Breaking change: removing Milvus1DocumentStore

What's Changed

Breaking Changes

Pipeline

DocumentStores

UI / Demo

Documentation

Other Changes

New Contributors

Contributors

v1.11.0rc1

⭐ Highlights

Expanding Haystack’s LLM support further with the new CohereEmbeddingEncoder (#3356)

Extracting headlines from Markdown and PDF files (#3445 #3488)

Introducing the proposals design process (#3333)

⚠️ Breaking change: removing Milvus1DocumentStore

What's Changed

Breaking Changes

Pipeline

DocumentStores

UI / Demo

Documentation

Other Changes

New Contributors

Contributors

v1.10.0

⭐ Highlights

Expanding Haystack's LLM support with the new OpenAIEmbeddingEncoder (#3356)

Multimodal retrieval is here! (#2891)

Multi-platform Docker images

⚠️ Breaking change in embed_queries method (#3252)

What's Changed

Breaking Changes

Pipeline

DocumentStores

UI / Demo

Documentation

Other Changes

New Contributors

Contributors

v1.10.0rc1

⭐ Highlights

Expanding Haystack's LLM support with the new OpenAIEmbeddingEncoder (#3356)

Multimodal retrieval is here! (#2891)

Multi-platform Docker images

⚠️ Breaking change in embed_queries method (#3252)

What's Changed

Breaking Changes

Pipeline

DocumentStores

UI / Demo

Documentation

Other Changes

Large Language Models with `PromptNode`

Support for `BM25Retriever` in `InMemoryDocumentStore`

Expanding Haystack’s LLM support further with the new `CohereEmbeddingEncoder` (#3356)

⚠️ Breaking change: removing `Milvus1DocumentStore`

Expanding Haystack’s LLM support further with the new `CohereEmbeddingEncoder` (#3356)

⚠️ Breaking change: removing `Milvus1DocumentStore`

Expanding Haystack's LLM support with the new `OpenAIEmbeddingEncoder` (#3356)

⚠️ Breaking change in `embed_queries` method (#3252)

Expanding Haystack's LLM support with the new `OpenAIEmbeddingEncoder` (#3356)

⚠️ Breaking change in `embed_queries` method (#3252)