Object Detection

Models:

Integrated Yolov8 , NLP, Voice Commands

Camera
- Captures real-time images or video frames from the environment.
- Sends the image stream to the next stage.
Object Detection Module
- Processes the images using the YOLO model.
- Detects and classifies objects within the frame.
- Outputs object names, bounding boxes, and confidence scores.
- Estimates whether objects are "closer" or "farther" based on the size of their bounding boxes.
Voice Command Module
- Microphone Input: Listens to user commands using a speech recognition library (e.g., SpeechRecognition).
- Processes commands like "activate," "stop," or "exit."
Text-to-Speech Module
- Converts detected objects, their proximity (e.g., "closer" or "farther"), and scene descriptions into spoken feedback using Google Text-to-Speech (gTTS).
- Plays the audio output through the speaker.
AI Response System
- Combines object detection results, including proximity estimation (closer/farther), with voice input.
- Provides intelligent responses to user commands and describes the detected scene (e.g., "The scene contains a person and a bicycle, both are closer").
User Interaction
- Outputs include a live video stream with annotations (bounding boxes and labels), and audible scene descriptions indicating the proximity of objects.
- Users interact through voice commands to control the system.

Flow of Data:
Camera → Object Detection Module (with proximity estimation) → AI Response System → (Text-to-Speech and Display) → User Interaction

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Object_Detection_YOLOv3.ipynb		Object_Detection_YOLOv3.ipynb
README.md		README.md
Yolo8s_with_audio_activation.py		Yolo8s_with_audio_activation.py
YoloV4.ipynb		YoloV4.ipynb
activate.py		activate.py
livercnn.py		livercnn.py
text_to_audio.py		text_to_audio.py
yolo5_8.py		yolo5_8.py
yolo8n.py		yolo8n.py