diff --git a/docs/how-to/multimodal-agents.mdx b/docs/how-to/multimodal-agents.mdx
index 1dcf50d257..b3bebbc810 100644
--- a/docs/how-to/multimodal-agents.mdx
+++ b/docs/how-to/multimodal-agents.mdx
@@ -1,14 +1,14 @@
 ---
 title: Using Multimodal Agents
 description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
-icon: image
+icon: video
 ---
 
-# Using Multimodal Agents
+## Using Multimodal Agents
 
 CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
 
-## Enabling Multimodal Capabilities
+### Enabling Multimodal Capabilities
 
 To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent:
 
@@ -25,7 +25,7 @@ agent = Agent(
 
 When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`.
 
-## Working with Images
+### Working with Images
 
 The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities.
 
@@ -108,7 +108,7 @@ The multimodal agent will automatically handle the image processing through its
 - Process image content with optional context or specific questions
 - Provide analysis and insights based on the visual information and task requirements
 
-## Best Practices
+### Best Practices
 
 When working with multimodal agents, keep these best practices in mind:
 
diff --git a/docs/mint.json b/docs/mint.json
index 9103434b49..585fc0abdb 100644
--- a/docs/mint.json
+++ b/docs/mint.json
@@ -91,6 +91,7 @@
         "how-to/custom-manager-agent",
         "how-to/llm-connections",
         "how-to/customizing-agents",
+        "how-to/multimodal-agents",
         "how-to/coding-agents",
         "how-to/force-tool-output-as-result",
         "how-to/human-input-on-execution",