v1.15.0
⭐ Highlights
Build Agents Yourself with Open Source
Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.
web_qa_tool = Tool(
name="Search",
pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
description="useful for when you need to Google questions.",
output_variable="results",
)
agent = Agent(
prompt_node=agent_pn,
prompt_template=prompt_template,
tools=[web_qa_tool],
final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")
Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!
Flexible PromptTemplates
Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:
PromptTemplate(
name="question-answering",
prompt_text="Given the context please answer the question.\n"
"Context: {join(documents)}\n"
"Question: {query}\n"
"Answer: ",
output_parser=AnswerParser(),
)
More details here.
Using ChatGPT through PromptModel
A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:
prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)
Haystack Extras
We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:
pip install farm-haystack-text2speech
What's Changed
Breaking Changes
- feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
- feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
- build: Use
uvicorn
instead ofgunicorn
as server in REST API's Dockerfile by @bogdankostic in #4304 - chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
- refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
- fix: Fix debug on PromptNode by @recrudesce in #4483
- feat: PromptTemplate extensions by @tstadel in #4378
Pipeline
- feat: Add JsonConverter node by @bglearning in #4130
- fix: Shaper store all outputs from function by @sjrl in #4223
- refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
- fix: add option to not override results by
Shaper
by @tstadel in #4231 - feat: reduce and focus telemetry by @ZanSara in #4087
- refactor: Remove deprecated nodes
EvalDocuments
andEvalAnswers
by @anakin87 in #4194 - refact: mark unit tests under the
test/nodes/**
path by @masci in #4235 - fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
- test: replace
ElasticsearchDS
withInMemoryDS
when it makes sense; supportscale_score
inInMemoryDS
by @anakin87 in #4283 - test: mock all
Translator
tests and move one toe2e
by @ZanSara in #4290 - fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
- feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
- test: move tests on standard pipelines in
e2e/
by @ZanSara in #4309 - fix: EvalResult load migration by @tstadel in #4289
- feat: Report execution time for pipeline components in
_debug
by @zoltan-fedor in #4197 - refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
- fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
- docs:
TransformersImageToText
- inform about supported models, better exception handling by @anakin87 in #4310 - fix: check that
answer
is notNone
before accessing it intable.py
by @culms in #4376 - feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
- Add Whisper node by @vblagoje in #4335
- tests: Mark Crawler tests correctly by @silvanocerza in #4435
- test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
- fix: issue evaluation check for content type by @ju-gu in #4181
- feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
- refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
- refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
- refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
- refactor: remove telemetry v1 by @ZanSara in #4496
- feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
- feat: Add agent tools by @vblagoje in #4437
- refactor: reduce telemetry events count by @ZanSara in #4501
DocumentStores
- fix:
OpenSearchDocumentStore.delete_index
doesn't raise by @tstadel in #4295 - fix: increase
MetaDocumentORM
value length inSQLDocumentStore
by @anakin87 in #4333 - fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
- refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498
Documentation
- feat: add
top_k
toPromptNode
by @tstadel in #4159 - feat: Add Agent by @julian-risch in #4148
- ci: Automate OpenAPI specs upload to Readme.io by @silvanocerza in #4228
- ci: Refactor docs config and generation by @silvanocerza in #4280
- feat: Add Azure as OpenAI endpoint by @vblagoje in #4170
- refactor: Allow flexible document id generation by @danielbichuetti in #4326
Other Changes
- ci: Move xpdf build into separate container by @silvanocerza in #4199
- refactor: Remove
id_hash_keys
parameter infrom_dict
method by @bogdankostic in #4207 - bug: Check cuda availability before calling by @abwiersma in #4174
- ci: Fix Dockerfile.base failing cause of missing git by @silvanocerza in #4210
- fix: allowing file-upload api to write files to disk by @mayankjobanputra in #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator by @sjrl in #4220
- ci: Fix Dockerfile.base failing cause of missing dependencies by @silvanocerza in #4215
- fix: Better error messages for OCR requirement (#3767) by @in-balamurugan in #3900
- Docs: Update top_k description for PromptNode by @agnieszka-m in #4224
- bug: fix typo in
google.colab
package detection by @ZanSara in #4238 - proposal: Implement Agent demo by @vblagoje in #4085
- ci: Remove unnecessary operations in minor_version_release.yml by @silvanocerza in #4267
- Fix: Issue of failure to initialize input_converter in Seq2SeqGenerator when model_file_path is given as folder path on local disk after manual model download by @Kshitijpawar in #4213
- test: Fix deprecation fixture by @silvanocerza in #4219
- Fix: Allow
torch_dtype="auto"
in PromptNode by @sjrl in #4166 - ci: Parallellize Docker build job by @silvanocerza in #4268
- test: Added integration test for using EntityExtractor in query pipeline by @sjrl in #4117
- refactor: Make extraction of "Tool" and "Tool input" for Agent more robust and user-friendly by @tholor in #4269
- ci: Change docker_release.yml workflow to run after successful PyPi release by @silvanocerza in #4293
- docs: Fix search path for Shaper API docs by @bogdankostic in #4306
- test: mock all Summarizer tests and move a few into e2e by @ZanSara in #4299
- ci: Fix docstring-labeler.yml workflow by @silvanocerza in #4307
- build: Remove xpdf dependencies by @bogdankostic in #4314
- test: Pin requests-cache test dependency to <1.0.0 by @silvanocerza in #4325
- chore: Add Intelijus as using Haystack by @danielbichuetti in #4330
- refactor: Separate PromptModelInvocationLayers in providers.py by @vblagoje in #4327
- ci: Add workflow to push CI metrics to Datadog by @silvanocerza in #4336
- Update README.md by @TuanaCelik in #4340
- proposal: Shapers in Prompt Templates by @tstadel in #4172
- refactor: simplify registration of
PromptModelInvocationLayer
by @ZanSara in #4339 - fix: Fix
print_answers
for output of queryrun_batch
by @vbernardes in #4273 - refactor: Simplify agent and tool interaction by @vblagoje in #4362
- proposal: drop
BaseComponent
and re-implementPipeline
by @ZanSara in #4284 - feat:
LanguageClassifier
by @ZanSara in #2994 - docs: add
DocumentLanguageClassifier
API by @anakin87 in #4401 - chore: make the docs generator runnable without an API key by @masci in #4405
- feat: new Pipeline by @ZanSara in #4368
- ci: Use bigger runner for integration-tests-linux by @silvanocerza in #4422
- test: Fix audio tests failing by @silvanocerza in #4418
- feat: improve is_containerized() by @masci in #4412
- Docs: Update Agent docstrings + add api docs by @agnieszka-m in #4296
- refactor: rename
v2
package topreview
by @ZanSara in #4409 - feat: add PromptNode OpenAI token streaming by @vblagoje in #4397
- feat: Isolate integration PromptNode tests into a separate test unit by @vblagoje in #4420
- Docs: Fix order and category of agent by @agnieszka-m in #4440
- test: Remove unnecessary imports in conftest.py by @silvanocerza in #4434
- Docs: Fix agent module by @agnieszka-m in #4441
- test: stop running the CI on macos by @masci in #4443
- ci: Run readme_sync.yml in PRs by @silvanocerza in #4442
- feat: Add the New Tokenizer of
gpt-3.5-turbo
by @AI-Ahmed in #4331 - Docs: Update language classifier docstrings by @agnieszka-m in #4413
- ci: remove python_cache internal action by @silvanocerza in #4429
- feat: Add ChatGPT PromptNode layer by @vblagoje in #4357
- chore: Make version semver compliant by @silvanocerza in #4456
- refactor: Add AgentStep by @vblagoje in #4431
- feat: Enable PromptNode to use text-generation models by @vblagoje in #4349
- feat: add additional params to file upload endpoint by @josepablofm78 in #4445
- fix: stop loading FAISS and InMem doc Store for indexing pipelines by @mayankjobanputra in #4396
- feat:Add agent event callbacks by @vblagoje in #4491
- bug: Exclude rdflib 6.3.2 because of license issues by @julian-risch in #4495
- ci: remove telemetry env var by @ZanSara in #4497
- feat: prompt at query time by @tstadel in #4454
- chore: wire up
AnswerParser
by @tstadel in #4505 - Fix pipeline config and agent tools hashing for telemetry by @silvanocerza #4508
New Contributors
- @abwiersma made their first contribution in #4174
- @in-balamurugan made their first contribution in #3900
- @Kshitijpawar made their first contribution in #4213
- @vbernardes made their first contribution in #4273
- @culms made their first contribution in #4376
- @kaixuanliu made their first contribution in #4311
- @josepablofm78 made their first contribution in #4445
- @recrudesce made their first contribution in #4483
Full Changelog: v1.14.0...v1.15.0