Welcome to the SightSync repository - a platform dedicated to assisting visually impaired individuals by delivering natural, accurate verbal descriptions of their surroundings. This project was developed with lots of ❤️ by Oriol and Ferran for LauzHack2023 🏆
SightSync is an application that uses open-source models 💡 to assist visually-impaired individuals navigate their environment with greater ease and autonomy. It's essentially their ‘eyes’ with their voice description capabilities, feeding visual data through auditory channels.
We've built SightSync using the following state-of-the-art open-source models:
-
Zephyr: For understanding live environment scene structure at scale. The model (LLM) allows us to make sense of what the user is asking for, an essential feature for our cause 🔍
-
Distil-whisper: To convert spoken language into text format (STT) 🎤
-
FastPitch: For converting text description into voice(TTS) 🎧
-
GroundingDino: To provide item location details in a scenario, adding another layer of detail to our descriptive capabilities 📍
-
CogVLM: To generate accurate context-aware description of surroundings, allowing us to craft immersive auditory experiences that accurately reflect an individual's environment 🌍
Please note, all models are hosted on-prem 🏭
As an open-source project, we welcome anyone who would like to contribute. We believe that every contribution, no matter how small, can make a big difference!
If you have a feature request, bug report, or just want to chat, don't hesitate to get in touch with us:
Oriol Agost - [email protected] Ferran Aran - [email protected]
Thank you for your interest in SightSync. We're excited to see where we can go together! 🌟