04 Sep 14:36

vblagoje

ef2eac3

v1.20.0

⭐ Highlights

🪄LostInTheMiddleRanker and DiversityRanker

We are excited to introduce two new rankers to Haystack: LostInTheMiddleRanker and DiversityRanker!

LostInTheMiddleRanker is based on the research paper "Lost in the Middle: How Language Models Use Long Contexts" by Liu et al. It reorders documents according to the "Lost in the Middle" strategy, which places the most relevant paragraphs at the beginning and end of the context, while less relevant paragraphs are positioned in the middle. This ranker can be used in Retrieval-Augmented Generation (RAG) pipelines. Here is an example of how to use it:

web_retriever = WebRetriever(api_key=search_key, top_search_results=5, mode="preprocessed_documents", top_k=50)

sampler = TopPSampler(top_p=0.97)
diversity_ranker = DiversityRanker()
litm_ranker = LostInTheMiddleRanker(word_count_threshold=1024)

pipeline = Pipeline()
pipeline.add_node(component=web_retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=sampler, name="Sampler", inputs=["Retriever"])
pipeline.add_node(component=diversity_ranker, name="DiversityRanker", inputs=["Sampler"])
pipeline.add_node(component=litm_ranker, name="LostInTheMiddleRanker", inputs=["DiversityRanker"])
pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["LostInTheMiddleRanker"])

In this example, we have positioned the LostInTheMiddleRanker as the last component before the PromptNode. This is because the LostInTheMiddleRanker is designed to be used in combination with other rankers. It is recommended to place it towards the end of the pipeline (as the last ranker), so that it can reorder the documents that have already been ranked by other rankers.

DiversityRanker is a tool that helps to increase the diversity of a set of documents. It uses sentence-transformer models to calculate semantic embeddings for each document and then ranks them in a way that ensures that each subsequent document is the least similar to the ones that have already been selected. This results in a list where each document contributes the most to the overall diversity of the selected set.

We'll reuse the same example from the LostInTheMiddleRanker to point out that the DiversityRanker can be used in combination with other rankers. It is recommended to place it in the pipeline after the similarity ranker but before the LostInTheMiddleRanker. Note that DiversityRanker is typically used in generative RAG pipelines to ensure that the generated answer is drawn from a diverse set of documents. This setup is typical for Long-Form Question Answering (LFQA) tasks. Check out Enhancing RAG Pipelines in Haystack: Introducing DiversityRanker and LostInTheMiddleRanker article on Haystack Blog for details.

📰 New release note management

We have implemented a new release note management system, reno. From now on, every contributor is responsible for adding release notes for the feature or bugfix they're introducing in Haystack in the same Pull Request containing the code changes. The goal is to encourage detailed and accurate notes for every release, especially when it comes to complex features or breaking changes.

See how to work with the new release notes in our Contribution Guide.

⬆️ Upgrade Notes

If you're a Haystack contributor, you need a new tool called reno to manage the release notes.
Please run pip install -e .[dev] to ensure you have reno available in your environment.

The Opensearch custom query syntax changes: the old filter placeholders for custom_query are no longer supported.
Replace your custom filter expressions with the new ${filters} placeholder:

Old:

  retriever = BM25Retriever(
    custom_query="""
      {
          "query": {
              "bool": {
                  "should": [{"multi_match": {
                      "query": ${query},
                      "type": "most_fields",
                      "fields": ["content", "title"]}}
                  ],
                  "filter": [
                      {"terms": {"year": ${years}}},
                      {"terms": {"quarter": ${quarters}}},
                      {"range": {"date": {"gte": ${date}}}}
                  ]
              }
          }
      }
    """
  )

  retriever.retrieve(
      query="What is the meaning of life?",
      filters={"years": [2019, 2020], "quarters": [1, 2, 3], "date": "2019-03-01"}
  )

New:

  retriever = BM25Retriever(
    custom_query="""
      {
          "query": {
              "bool": {
                  "should": [{"multi_match": {
                      "query": ${query},
                      "type": "most_fields",
                      "fields": ["content", "title"]}}
                  ],
                  "filter": ${filters}
              }
          }
      }
    """
  )

  retriever.retrieve(
      query="What is the meaning of life?",
      filters={"year": [2019, 2020], "quarter": [1, 2, 3], "date": {"$gte": "2019-03-01"}}
  )

This update impacts only those who have created custom invocation layers by subclassing PromptModelInvocationLayer.
Previously, the invoke() method in your custom layer received all prompt template parameters (like query,
documents, etc.) as keyword arguments. With this change, these parameters will no longer be passed in as keyword
arguments. If you've implemented such a custom layer, you'll need to potentially update your code to accommodate
this change.

🥳 New Features

The LostInTheMiddleRanker can be used like other rankers in Haystack. After initializing LostInTheMiddleRanker with the desired parameters, it can be used to rank/reorder a list of documents based on the "Lost in the Middle" order - the most relevant documents are located at the top and bottom of the returned list, while the least relevant documents are found in the middle. We advise that you use this ranker in combination with other rankers, and to place it towards the end of the pipeline.
The DiversityRanker can be used like other rankers in Haystack and it can be particularly helpful in cases where you have highly relevant yet similar sets of documents. By ensuring a diversity of documents, this new ranker facilitates a more comprehensive utilization of the documents and, particularly in RAG pipelines, potentially contributes to more accurate and rich model responses.

When using custom_query in BM25Retriever along with OpenSearch or Elasticsearch, we added support for dynamic filters, like in regular queries. With this change, you can pass filters at query-time without having to modify the custom_query:
Instead of defining filter expressions and field placeholders, all you have to do is setting the ${filters} placeholder analogous to the ${query} placeholder into your custom_query.
For example:

  {
      "query": {
          "bool": {
              "should": [{"multi_match": {
                  "query": ${query},                 // mandatory query placeholder
                  "type": "most_fields",
                  "fields": ["content", "title"]}}
              ],
              "filter": ${filters}                 // optional filters placeholder
          }
      }
  }

DeepsetCloudDocumentStore supports searching multiple fields in sparse queries. This enables you to search meta fields as well when using BM25Retriever. For example set search_fields=["content", "title"] to search the title meta field along with the document content.
Rework DocumentWriter to remove DocumentStoreAwareMixin. Now we require a generic DocumentStore when initialisating the writer.
Rework MemoryRetriever to remove DocumentStoreAwareMixin. Now we require a MemoryDocumentStore when initialisating the retriever.
Introduced allowed_domains parameter in WebRetriever for domain-specific searches, thus enabling "talk to a website" and "talk to docs" scenarios.

✨ Enhancements

The WebRetriever now employs an enhanced caching mechanism that caches web page content based on search engine results rather than the query.
Upgrade transformers to the latest version 4.32.1 so that Haystack benefits from Llama and T5 bugfixes: https://github.com/huggingface/transformers/releases/tag/v4.32.1
Upgrade Transformers to the latest version 4.32.0.
This version adds support for the GPTQ quantization and integrates MPT models.
Add top_k parameter to the DiversityRanker init method.
Enable setting the max_length value when running PromptNodes using local HF text2text-generation models.
Enable passing use_fast to the underlying transformers' pipeline
Enhance FileTypeClassifier to detect media file types like mp3, mp4, mpeg, m4a, and similar.
Minor PromptNode HFLocalInvocationLayer test improvements
Several minor enhancements for LinkContentFetcher:
- Dynamic content handler resolution
- Custom User-Agent header (optional, minimize blocking)
- PDF support
- Register new content handlers
If LinkContentFetcher encounters a block or receives any response code other than HTTPStatus.OK, return the search engine snippet as content, if it's available.
Allow loading Tokenizers for prompt models not natively supported ...

Assets 2

30 Aug 12:29

vblagoje

v1.20.0-rc1

7d39184

v1.20.0-rc1 Pre-release

Pre-release

⭐ Highlights

🪄LostInTheMiddleRanker and DiversityRanker

We are excited to introduce two new rankers to Haystack: LostInTheMiddleRanker and DiversityRanker!

web_retriever = WebRetriever(api_key=search_key, top_search_results=5, mode="preprocessed_documents", top_k=50)

sampler = TopPSampler(top_p=0.97)
diversity_ranker = DiversityRanker()
litm_ranker = LostInTheMiddleRanker(word_count_threshold=1024)

pipeline = Pipeline()
pipeline.add_node(component=web_retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=sampler, name="Sampler", inputs=["Retriever"])
pipeline.add_node(component=diversity_ranker, name="DiversityRanker", inputs=["Sampler"])
pipeline.add_node(component=litm_ranker, name="LostInTheMiddleRanker", inputs=["DiversityRanker"])
pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["LostInTheMiddleRanker"])

📰 New release note management

See how to work with the new release notes in our Contribution Guide.

⬆️ Upgrade Notes

If you're a Haystack contributor, you need a new tool called reno to manage the release notes.
Please run pip install -e .[dev] to ensure you have reno available in your environment.

The Opensearch custom query syntax changes: the old filter placeholders for custom_query are no longer supported.
Replace your custom filter expressions with the new ${filters} placeholder:

Old:

  retriever = BM25Retriever(
    custom_query="""
      {
          "query": {
              "bool": {
                  "should": [{"multi_match": {
                      "query": ${query},
                      "type": "most_fields",
                      "fields": ["content", "title"]}}
                  ],
                  "filter": [
                      {"terms": {"year": ${years}}},
                      {"terms": {"quarter": ${quarters}}},
                      {"range": {"date": {"gte": ${date}}}}
                  ]
              }
          }
      }
    """
  )

  retriever.retrieve(
      query="What is the meaning of life?",
      filters={"years": [2019, 2020], "quarters": [1, 2, 3], "date": "2019-03-01"}
  )

New:

  retriever = BM25Retriever(
    custom_query="""
      {
          "query": {
              "bool": {
                  "should": [{"multi_match": {
                      "query": ${query},
                      "type": "most_fields",
                      "fields": ["content", "title"]}}
                  ],
                  "filter": ${filters}
              }
          }
      }
    """
  )

  retriever.retrieve(
      query="What is the meaning of life?",
      filters={"year": [2019, 2020], "quarter": [1, 2, 3], "date": {"$gte": "2019-03-01"}}
  )

This update impacts only those who have created custom invocation layers by subclassing PromptModelInvocationLayer.
Previously, the invoke() method in your custom layer received all prompt template parameters (like query,
documents, etc.) as keyword arguments. With this change, these parameters will no longer be passed in as keyword
arguments. If you've implemented such a custom layer, you'll need to potentially update your code to accommodate
this change.

🥳 New Features

The LostInTheMiddleRanker can be used like other rankers in Haystack. After initializing LostInTheMiddleRanker with the desired parameters, it can be used to rank/reorder a list of documents based on the "Lost in the Middle" order - the most relevant documents are located at the top and bottom of the returned list, while the least relevant documents are found in the middle. We advise that you use this ranker in combination with other rankers, and to place it towards the end of the pipeline.
The DiversityRanker can be used like other rankers in Haystack and it can be particularly helpful in cases where you have highly relevant yet similar sets of documents. By ensuring a diversity of documents, this new ranker facilitates a more comprehensive utilization of the documents and, particularly in RAG pipelines, potentially contributes to more accurate and rich model responses.

  {
      "query": {
          "bool": {
              "should": [{"multi_match": {
                  "query": ${query},                 // mandatory query placeholder
                  "type": "most_fields",
                  "fields": ["content", "title"]}}
              ],
              "filter": ${filters}                 // optional filters placeholder
          }
      }
  }

DeepsetCloudDocumentStore supports searching multiple fields in sparse queries. This enables you to search meta fields as well when using BM25Retriever. For example set search_fields=["content", "title"] to search the title meta field along with the document content.
Rework DocumentWriter to remove DocumentStoreAwareMixin. Now we require a generic DocumentStore when initialisating the writer.
Rework MemoryRetriever to remove DocumentStoreAwareMixin. Now we require a MemoryDocumentStore when initialisating the retriever.
Introduced allowed_domains parameter in WebRetriever for domain-specific searches, thus enabling "talk to a website" and "talk to docs" scenarios.

✨ Enhancements

The WebRetriever now employs an enhanced caching mechanism that caches web page content based on search engine results rather than the query.
Upgrade transformers to the latest version 4.32.1 so that Haystack benefits from Llama and T5 bugfixes: https://github.com/huggingface/transformers/releases/tag/v4.32.1
Upgrade Transformers to the latest version 4.32.0.
This version adds support for the GPTQ quantization and integrates MPT models.
Add top_k parameter to the DiversityRanker init method.
Enable setting the max_length value when running PromptNodes using local HF text2text-generation models.
Enable passing use_fast to the underlying transformers' pipeline
Enhance FileTypeClassifier to detect media file types like mp3, mp4, mpeg, m4a, and similar.
Minor PromptNode HFLocalInvocationLayer test improvements
Several minor enhancements for LinkContentFetcher:
- Dynamic content handler resolution
- Custom User-Agent header (optional, minimize blocking)
- PDF support
- Register new content handlers
If LinkContentFetcher encounters a block or receives any response code other than HTTPStatus.OK, return the search engine snippet as content, if it's available.
Allow loading Tokenizers for prompt models not natively supported ...

Assets 2

26 Jul 16:22

anakin87

v1.19.0

cf25763

v1.19.0

⭐️ Highlights

🔎 Elasticsearch 8 support

We are thrilled to share that Haystack now supports the latest version of Elasticsearch, Elasticsearch 8, as Document Store backend. To use Haystack with Elasticsearch 8, just install the new elasticsearch8 extra:

pip install farm-haystack[elasticsearch8]

Importing ElasticsearchDocumentStore from haystack.document_stores will automatically choose the correct Document Store based on the version of the installed Elasticsearch client.

🗂️ RecentnessRanker

We're excited to introduce a new feature to Haystack – a document recentness ranking component! We recognized the importance of ranking documents based on their recentness, especially in scenarios where timely information is critical. For instance, when searching through technical documentation for software releases or news articles, it's essential to prioritize the most up-to-date information. 👇

from haystack.nodes import RecentnessRanker

ranker = RecentnessRanker(
    date_meta_field="date",  # Key pointing to the date field in the metadata.
    ranking_mode="score",
    weight=0.5,  # A 0.5 weight means content relevance and age are averaged.
)

For more details, check out the documentation.

🧠 Improved support for Anthropic Claude

We're thrilled to announce an important update to Haystack's Anthropic Claude support! This update follows the latest improvements in Anthropic Claude models, notably support for Claude 2 and their humongous context window sizes.

Moreover, we've integrated Claude models into our example scripts, making it easier for users to test these cutting-edge models. For instance, check out the updated examples/link_content_blog_post_summary.py script for a demo of Claude summarizing blog posts directly from hyperlinks.

We still support the old models (i.e., claude-v1) and the new Claude models. For more details, see the Anthropic Claude documentation.

🚀 Support for Llama 2 on AWS SageMaker

We are excited to share that Haystack now supports models of the Llama 2 family deployed to AWS SageMaker! Once you’ve deployed your Llama 2 models (including the chat variant) in AWS SageMaker, use them with PromptNode by simply providing the inference endpoint name, your aws_profile_name and aws_custom_attributes👇

from haystack.nodes import PromptNode

prompt_node = PromptNode(
    model_name_or_path="sagemaker-llama-2-endpoint-name", 
    model_kwargs={"aws_profile_name": "my_aws_profile_name", 
                                      "aws_custom_attributes":{"accept_eula": True}}
)
result = prompt_node("Berlin is the capital of")
print(result)

# or the Llama 2 chat model
prompt_node = PromptNode(
    model_name_or_path="sagemaker-llama-2-chat-endpoint-name", 
    model_kwargs={"aws_profile_name": "my_aws_profile_name", 
                                      "aws_custom_attributes":{"accept_eula": True}}
)
chat_conversation = [[
    {"role": "user", "content": "what is the recipe of mayonnaise?"},
]]
result = prompt_node(chat_conversation)
print(result)

For more details on model deployment, check out the documentation.

🎉 Now using transformers 4.31.0

With this release, Haystack depends on the latest version of the transformers library, allowing support for Llama 2.

🚫 SklearnQueryClassifier deprecation

Starting from version 1.19, SklearnQueryClassifier is being deprecated and will be removed from Haystack as of version 1.21. We recommend using the more powerful TransformersQueryClassifier instead. See the announcement for more details.

What's Changed

Pipeline

feat: globally disable progress bars by @ZanSara in #5207
Add cpu-remote-inference Docker image by @vblagoje in #5225
fix: Support isolated node eval in run_batch in Generators by @bogdankostic in #5291
feat: support OpenAI-Organization for authentication by @anakin87 in #5292
docs: Small documentation updates to dense.py by @sjrl in #5305
test: Refactor some retriever tests into unit tests by @sjrl in #5306
feat: Add support for meta fields that are lists when using embed_meta_fields by @sjrl in #5307
refactor: Extract link retrieval from WebRetriever, introduce LinkContentFetcher by @vblagoje in #5227
fix: update WebRetriever docstrings and default mode by @dfokina in #5352
added hybrid search example by @nickprock in #5376

DocumentStores

fix: Allow filtering on list fields in InMemoryDocumentStore with all operators by @bogdankostic in #5208
Fix: FAISSDocumentStore - make write_documents properly work in combination w update_embeddings by @anakin87 in #5221
bug: fix for pinecone not working for per document updates by @vblagoje in #5110
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests by @tstadel in #5113
ci: Add unit test for Elasticsearch8 by @bogdankostic in #5300
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 by @bogdankostic in #5320

Documentation

feat: BM25 retrieval for MemoryDocumentStore by @vblagoje in #5151
fix: install inference in REST API tests by @ZanSara in #5252
fix: import_utils fetch_archive_from_http - improve url parsing for fetching archive from http by @malte-aws in #5199
fix: Improve robustness of get_task HF pipeline invocations by @MichelBartels in #5284
feat: introduce Store protocol (v2) by @ZanSara in #5259
fix: num_return_sequences should be less than num_beams, not top_k by @faaany in #5280
Revert "fix: num_return_sequences should be less than num_beams, not top_k" by @julian-risch in #5434
chore: deprecate SklearnQueryClassifier by @anakin87 in #5324
fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed by @MichelBartels in #5308
fix: a small bug in StopWordsCriteria by @faaany in #5316
chore: fix typo in base.py by @eltociear in #5356
feat: extend pipeline.add_component to support stores by @ZanSara in #5261
proposal: Add RecentnessRanker component by @elundaeva in #5289
feat: Add embed_meta_fields to Ranker nodes by @sjrl in #5361
feat: Recentness Ranker by @elundaeva in #5301
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes by @vblagoje in #5406
feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker by @vblagoje in #5437

Other Changes

fix: MultiLabel to_json works with Table Labels by @sjrl in #5257
chore: Remove deprecated return_table_cell from conftest.py by @sjrl in #5264
test: Update test/others/test_utils.py by @sjrl in #5270
test: Adapt batch size in retriever-reader benchmarks by @bogdankostic in #5281
fix: Add dependecies to build lxml successfully in base Docker image by @vblagoje in #5288
Remove requests_cache in tests by @silvanocerza in #5285
refactor: Simplify selection of Azure vs OpenAI invocation layers by @vblagoje in #5271
feat: batch mode for MemoryRetriever (v2) by @ZanSara in #5287
chore: Add support for hierarchical docs by @silvanocerza in #5278
build: Add elasticsearch7 and elasticsearch8 extra by @bogdankostic in #5296
chore: Adapt import message for Elasticsearch7 by @bogdankostic in #5295
ci: Add job for ES8 integration tests by @bogdankostic in #5297
ci: Update labeler.yml to account for Elasticsearch changes by @bogdankostic in #5318
create invocation-layers API reference page by @dfokina in https://github.com/dee...

Contributors

vblagoje, silvanocerza, and 13 other contributors

Assets 2

26 Jul 15:25

anakin87

v1.19.0-rc3

b478bea

v1.19.0-rc3 Pre-release

Pre-release

v1.19.0-rc3

Assets 2

30 Jun 17:50

masci

v1.18.1

94bfc11

v1.18.1

Pin pydantic to 1.x #5244

Full Changelog: v1.18.0...v1.18.1

Assets 2

29 Jun 09:14

anakin87

v1.18.0

9044294

v1.18.0

⭐️ Highlights

🗄️ Using LLMs Hosted on AWS SageMaker

We are thrilled to share that Haystack now supports the use of open source LLMs deployed to AWS SageMaker! This means you can easily host your models with AWS SageMaker and use them with PromptNode by simply providing the inference endpoint name and your aws_profile_name 👇

from haystack.nodes import PromptNode

prompt_node = PromptNode(
    model_name_or_path="sagemaker-model-endpoint-name", 
    model_kwargs={"aws_profile_name": "my_aws_profile_name", 
	          "aws_region_name": "your-region-k"}
)

For more details on model deployment, check out the documentation.

🗂️ PromptHub

Exciting news! We’re introducing PromptHub: A place for ready-made prompts for the most common NLP tasks. The best part about it is that you can easily import prompts from the hub into Haystack. For example, if you want to use deepset/topic-classification prompt with Haystack, all you need is the prompt name, and that's it! 👇

import os

from haystack.nodes import PromptNode, PromptTemplate

topic_classifier_template = PromptTemplate("deepset/topic-classification")
prompt_node = PromptNode(model_name_or_path="text-davinci-003", api_key=os.environ.get("OPENAI_API_KEY"))
prompt_node.prompt(prompt_template=topic_classifier_template, documents="YOUR_DOCUMENTS", options=["A LIST OF TOPICS"])

Check out the PromptHub and discover other prompt options for various NLP tasks.

🛠️ Adding Tools to ConversationalAgent

The wait is over – you can now greatly enhance your chat application by adding various tools using the tools parameter of ConversationalAgent.

from haystack.agents import Tool
from haystack.agents.conversational import ConversationalAgent

search_tool = Tool(
    name="USA_Presidents_QA",
    pipeline_or_node=presidents_qa,
    description="useful for when you need to answer questions related to the presidents of the USA.",
)

conversational_agent = ConversationalAgent(prompt_node=prompt_node, tools=[search_tool])
conversational_agent.run("YOUR QUERY")

Go to our docs to learn more about various tools you can use.

⚠️ Breaking Changes

🔦 A new optional dependency: `inference`

To simplify the installation of Haystack, we removed some redundant dependencies (like PyTorch) for users who rely on LLMs through APIs. This means you will get a smaller package and faster Haystack installation if you want to use Haystack with OpenAI models or Transformers models via Hugging Face Inference API. If you prefer to run models locally, for example on your GPU, pip install farm-haystack[inference] will install all required dependencies as before.

🔁 PromptTemplate parameter changes

As of v1.18, PromptTemplate accepts only prompt and output_parser and doesn’t support name and prompt_text parameters anymore. See an example of how you can migrate to the new PromptTemplate 👇🏼

qa_template = PromptTemplate(
    name="question-answering",
    promp_text="""
    Given the context please answer the question. Context: {join(documents)}; 
    Question: {query}; 
    Answer:
    """,
    output_parser=AnswerParser()
)

qa_template = PromptTemplate(
    prompt="""
    Given the context please answer the question. Context: {join(documents)}; 
    Question: {query}; 
    Answer:
    """,
    output_parser=AnswerParser()
)

🚫 Seq2SeqGenerator and RAGenerator deprecation

With Haystack 1.18, we removed the Seq2SeqGenerator and RAGenerator from Haystack. We recommend using the more powerful PromptNode instead. See the announcement for more details.

What's Changed

Breaking Changes

PromptHub integration in PromptNode by @ZanSara in #4879
chore: remove deprecated node PDFToTextOCRConverter by @masci in #4982
refactor: Use globally defined request timeout in ElasticsearchDocumentStore and OpenSearchDocumentStore by @bogdankostic in #5064
feat!: simplify weaviate auth by @hsm207 in #5115
feat!: Add extra for inference dependencies such as torch by @julian-risch in #5147
Remove deprecated param return_table_cell by @masci in #5218

Pipeline

build: Remove tiktoken alternative by @julian-risch in #4991
fix: fitz import switcher by @ZanSara in #5012
refactor: Generate eval result in separate method by @bogdankostic in #5001
chore: Unpin typing_extensions and remove all its uses by @silvanocerza in #5040
docs: Fix doc for FARMReader.predict by @pcreux in #5049
feat: Allow setting custom api_base for OpenAI nodes by @michaelfeil in #5033
fix: Ensure eval mode for farm and transformer models for predictions by @sjrl in #3791
chore: remove audio node import by @ZanSara in #5097
feat: introduce lazy_import by @ZanSara in #5084
fix: WebRetriever top_k is ignored in a pipeline by @vblagoje in #5106
build: Move Azure's Form Recognizer dependency to extras by @julian-risch in #5096
chore: mark some unit tests under test/pipeline by @ZanSara in #5124
feat: optional transformers by @ZanSara in #5101
feat: current_datetime shaper function by @ZanSara in #5195
feat: hard document length limit at max_chars_check by @ZanSara in #5191
chore: remove safe_import and all usages by @ZanSara in #5139
fix: Send batches of query-doc pairs to inference_from_objects by @bogdankostic in #5125
fix: Use add_isolated_node_eval of eval_batch in run_batch by @bogdankostic in #5223

DocumentStores

docs: updating docstrings to say OpenSearch and backlink to correct docs by @dtaivpp in #5000
feat: Add batching for querying in ElasticsearchDocumentStore and OpenSearchDocumentStore by @bogdankostic in #5063
feat: Add batch_size parameter and cast timeout_config value to tuple for WeaviateDocumentStore by @bogdankostic in #5079
fix: changing document scores by @benheckmann in #5090

Documentation

fix: Fix CohereInvocationLayer _ensure_token_limit not returning resized prompt by @silvanocerza in #4978
feat: Add prompt_template to conversational agent init params by @julian-risch in #4994
feat: Allow setting java options when launching Elasticsearch / OpenSearch by @bogdankostic in #5002
refactor: Adapt retriever benchmarks script by @bogdankostic in #5004
refactor: Add reader-retriever benchmark script by @bogdankostic in #5006
refactor: Adapt running benchmarks by @bogdankostic in #5007
fix: Move check for default PromptTemplates in PromptTemplate itself by @ZanSara in #5018
chore: Simplify DefaultPromptHandler logic and add tests by @silvanocerza in #4979
feat: prompts caching from PromptHub by @ZanSara in #5048
fix: Fix handling of streaming response in AnthropicClaudeInvocationLayer by @silvanocerza in #4993
feat: pass model parameters to HFLocalInvocationLayer via model_kwargs, enabling direct model usage by @vblagoje in #4956
feat: Consider prompt_node's default_prompt_template in agent by @julian-risch in #5095
fix: rename requests.py into requests_utils.py by @ZanSara in #5099
docs: update CLI readme by @dfokina in #5129
fix: small improvement to pipeline v2 tests by @ZanSara in #5153
feat: Optional Content Moderation for OpenAI PromptNode & OpenAIAnswerGenerator by @benheckmann in #5017
feat: Update ConversationalAgent by @bilgeyucel in #5065
fix: Check Agent's prompt template variables and prompt resolver parameters are aligned by @vblagoje in #5163
feat: add ensure token limit for direct prompting of ChatGPT by @sjrl in https://github...

Contributors

masci, pcreux, and 18 other contributors

Assets 2

28 Jun 15:57

anakin87

v1.18.0-rc3

42d2e55

v1.18.0-rc3 Pre-release

Pre-release

v1.18.0-rc3

Assets 2

27 Jun 16:08

anakin87

v1.18.0-rc2

503eee3

v1.18.0-rc2 Pre-release

Pre-release

v1.18.0-rc2

Assets 2

26 Jun 16:39

anakin87

v1.18.0-rc1

1df43a5

v1.18.0-rc1 Pre-release

Pre-release

v1.18.0-rc1

Assets 2

19 Jun 14:08

masci

v1.17.2

1f6bd9c

v1.17.2

This release fixes a bug in telemetry collection originate by the generalimport library. Since we already switched to a different library in the unstable branch, we decided to backport to 1.17.x the introduction of lazy_import, thus fixing the bug inherently.

What's Changed

feat: introduce lazy_import by @ZanSara in #5084

Full Changelog: v1.17.1...v1.17.2-rc1

Contributors

ZanSara

Assets 2

Releases: deepset-ai/haystack

v1.20.0

⭐ Highlights

🪄LostInTheMiddleRanker and DiversityRanker

📰 New release note management

⬆️ Upgrade Notes

🥳 New Features

✨ Enhancements

v1.20.0-rc1

⭐ Highlights

🪄LostInTheMiddleRanker and DiversityRanker

📰 New release note management

⬆️ Upgrade Notes

🥳 New Features

✨ Enhancements

v1.19.0

⭐️ Highlights

🔎 Elasticsearch 8 support

🗂️ RecentnessRanker

🧠 Improved support for Anthropic Claude

🚀 Support for Llama 2 on AWS SageMaker

🎉 Now using transformers 4.31.0

🚫 SklearnQueryClassifier deprecation

What's Changed

Pipeline

DocumentStores

Documentation

Other Changes

Contributors

v1.19.0-rc3

v1.18.1

v1.18.0

⭐️ Highlights

🗄️ Using LLMs Hosted on AWS SageMaker

🗂️ PromptHub

🛠️ Adding Tools to ConversationalAgent

⚠️ Breaking Changes

🔦 A new optional dependency: inference

🔁 PromptTemplate parameter changes

🚫 Seq2SeqGenerator and RAGenerator deprecation

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Contributors

v1.18.0-rc3

v1.18.0-rc2

v1.18.0-rc1

v1.17.2

What's Changed

Contributors

🔦 A new optional dependency: `inference`