Welcome to the Aana Examples repository, a collection of example projects that demonstrate the powerful capabilities of the Aana SDK. The Aana SDK is a robust framework designed for building multimodal applications, enabling large-scale deployment of machine learning models for vision, audio, and language tasks. It also supports Retrieval-Augmented Generation (RAG) systems, facilitating the development of advanced applications such as search engines, recommendation systems, and data insights platforms.
This repository contains example applications that showcase how to leverage the Aana SDK to create state-of-the-art multimodal applications.
A multimodal chat application that allows users to upload a video and ask questions about its content. The application processes both visual and audio information from the video to provide insightful answers. This example illustrates how to integrate video understanding with conversational AI.
- Demo: See the Chat with Video Demo notebook for a step-by-step guide on using the application.
An application that summarizes the content of a video by extracting transcription from the audio and generating a concise summary using a Language Model (LLM). This example is part of the Aana SDK tutorial on building multimodal applications.
- Tutorial: Follow the tutorial for detailed instructions on how to build similar applications using the Aana SDK.
To explore these examples, clone this repository with the --recurse-submodules
flag to ensure that the submodules are also cloned.
git clone --recurse-submodules https://github.com/mobiusml/aana_examples.git
You can also clone individual example repositories by following the links provided above.