From 835557e6483a43e98b92a22e5f7b6d47f8e884fd Mon Sep 17 00:00:00 2001 From: Tony Kipkemboi Date: Wed, 15 Jan 2025 13:54:32 -0500 Subject: [PATCH 1/2] fix: add multimodal docs path to mint.json --- docs/mint.json | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/mint.json b/docs/mint.json index 9103434b49..6ce4363a28 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -91,6 +91,7 @@ "how-to/custom-manager-agent", "how-to/llm-connections", "how-to/customizing-agents", + "how-to/multimodal-agents.mdx", "how-to/coding-agents", "how-to/force-tool-output-as-result", "how-to/human-input-on-execution", From c12343a8b8268265fa2ab896bd0101cfba144df8 Mon Sep 17 00:00:00 2001 From: Tony Kipkemboi Date: Wed, 15 Jan 2025 14:13:37 -0500 Subject: [PATCH 2/2] docs: update multimodal agents guide and mint.json configuration --- docs/how-to/multimodal-agents.mdx | 10 +++++----- docs/mint.json | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/how-to/multimodal-agents.mdx b/docs/how-to/multimodal-agents.mdx index 1dcf50d257..b3bebbc810 100644 --- a/docs/how-to/multimodal-agents.mdx +++ b/docs/how-to/multimodal-agents.mdx @@ -1,14 +1,14 @@ --- title: Using Multimodal Agents description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework. -icon: image +icon: video --- -# Using Multimodal Agents +## Using Multimodal Agents CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents. -## Enabling Multimodal Capabilities +### Enabling Multimodal Capabilities To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent: @@ -25,7 +25,7 @@ agent = Agent( When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`. -## Working with Images +### Working with Images The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities. @@ -108,7 +108,7 @@ The multimodal agent will automatically handle the image processing through its - Process image content with optional context or specific questions - Provide analysis and insights based on the visual information and task requirements -## Best Practices +### Best Practices When working with multimodal agents, keep these best practices in mind: diff --git a/docs/mint.json b/docs/mint.json index 6ce4363a28..585fc0abdb 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -91,7 +91,7 @@ "how-to/custom-manager-agent", "how-to/llm-connections", "how-to/customizing-agents", - "how-to/multimodal-agents.mdx", + "how-to/multimodal-agents", "how-to/coding-agents", "how-to/force-tool-output-as-result", "how-to/human-input-on-execution",