ez-video-chatbot-app

Sample app that uses videos of avatars or humans talking to create a GUI-based AI chatbot

日本語のREADMEはこちら

Features

By switching the videos played according to the conversation situation, this "pseudo" dialogue app can make it look like the avatar/human in the video is talking in real-time.
Using free speech recognition libraries and free speech synthesis software, real-time conversations can be conducted.
A high-performance GPU required for 3DCG engines is not necessary. While a discrete GPU like the GeForce RTX series would provide smoother operation, the app is designed to minimally function on PCs with only a CPU (with iGPU).

Limitations

Since Tkinter is used to rapidly redraw GUI component images to achieve video playback, the smoothness of playback varies depending on the CPU specifications.
Local models are used for speech recognition and speech synthesis, so the response may be slower depending on the specifications of the PC running the app.
The dialogue screen is fixed at 1920x1080. Depending on the screen scaling settings, the dialogue screen may be enlarged and overflow the display. In such cases, please change the OS display scaling setting to 100%.
It does not support displays with less than Full HD resolution.

Verified Environments (Laptop/Desktop)

OS: Windows 11 24H2
CPU: Intel Core i7-8650U, 11370H, AMD Ryzen 7 7840U, 5700X
GPU: NVIDIA GeForce GTX 1050 Laptop GPU, RTX 3050Ti Laptop GPU, RTX 3090

Video Playback Smoothness (Approx.)

Intel Core i7-8650U (Laptop): Not good
Intel Core i7-11370H (Laptop): OK
AMD Ryzen 7 7840U (Laptop): Good
AMD Ryzen 7 5700X (Desktop): Good
Video playback operates smoothly on mid-range to high-end desktop CPUs such as Intel Core i7 (12th generation or later), Core 7, or equivalent Ryzen CPUs. Even with high-end CPUs, video playback may stutter on laptops.
Using a PC with a discrete GPU will speed up speech recognition and synthesis processing. (To enable GPU support for speech recognition, you need to set up CUDA and install CUDA-compatible libraries.)

Application Execution Steps

The following software installations are required:

ffmpeg: For playing mp4 videos (Add the path where ffmpeg.exe is located to the environment variable PATH)
VOICEVOX: For Japanese speech synthesis (If possible, specify the GPU for the synthesis engine in the settings)
OpenAI API Key: For generating chat. Get API key that can use GPT-4o (if you want to use another model, please edit chat.py).

If your PC supports CUDA, installing CUDA-compatible PyTorch from the official PyTorch website will improve performance. https://pytorch.org/get-started/locally/

Install the necessary modules for running the application:

pip install -r requirements.txt

Edit constants.py and paste your API key:

class Constants:
    OPENAI_API_KEY = "sk-XXX"

Execute the following command to launch the GUI screen:

python3 main.py

Usage

After launching the screen, follow the instructions at the bottom of the screen and speak towards the audio input device. By default, it uses the OS's default input device (e.g., built-in PC microphone). To fix the device you want to use, edit tts.py.

Customization

LLM Prompts

prompts/system_prompt.txt: System prompt

RAG Data

Please locate your text files for RAG to rag/ folder.

UI Settings

constants.py: Specifies videos, chat background color, text color, and fixed values such as conversation end keywords.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
audio		audio
chats		chats
images		images
prompt		prompt
rag		rag
videos		videos
LICENSE		LICENSE
README.md		README.md
README_JP.md		README_JP.md
chat.py		chat.py
concat_image.py		concat_image.py
constants.py		constants.py
generate_bubble.py		generate_bubble.py
main.py		main.py
requirements.txt		requirements.txt
sample.png		sample.png
stt.py		stt.py
tools.py		tools.py
tts.py		tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ez-video-chatbot-app

Features

Limitations

Verified Environments (Laptop/Desktop)

Video Playback Smoothness (Approx.)

Application Execution Steps

Usage

Customization

LLM Prompts

RAG Data

UI Settings

About

Languages

License

keisuke-okb/ez-video-chatbot-app

Folders and files

Latest commit

History

Repository files navigation

ez-video-chatbot-app

Features

Limitations

Verified Environments (Laptop/Desktop)

Video Playback Smoothness (Approx.)

Application Execution Steps

Usage

Customization

LLM Prompts

RAG Data

UI Settings

About

Topics

Resources

License

Stars

Watchers

Forks

Languages