Scene Descriptions for the Visually Impaired

CSCI 5541 Natural Language Processing Final Project

Group Name: Sentimantals

Members: Mohit Yadav, Alex Besch, Abbas Booshehrain, Ruolei Zeng

For summary or our work please visit our Project WebPage here.

For detailed analysis and methodology of our work, see the project final report here.

Demo Output from our Method:

banner_video.mp4

Installaton

This code was tested with Python 3.12.

Clone this repo in your machine by running the following command in terminal:

git clone https://github.com/mohitydv09/nlp-final-project.git
cd nlp-final-project

Create the conda env

conda env create -f environment.yml

This repo uses OpenAI's ChatGPT for inference and hence OpenAI API Key is required to be stored as evn variable OPENAI_API_KEY.

This can be done via following command:

export OPENAI_API_KEY="YOUR OPENAI API KEY"

Check the correct setting of the evn variable by running:

echo $OPENAI_API_KEY

Downloading sample data to run the code without Depth Camera

Download the test data from here and copy it into the data/ folder.

Note: The code will automatically download the object detection model and the Vision Language model locally.

Setting Parameters

The following variables control how the code fuctions. Open the file main.py and adjust accordingly:

## Choose the OpenAI's LLM Model to be used
LLM_MODEL_NAME = 'gpt-4o-mini'

## Set Model Temperature
LLM_TEMPERATURE = 0.0 ## Deterministic

## Set the data input stream
WORKING_WITH_LOCAL_DATA = True 
    # True - Uses local data in ./data folder
    # False - Requires Intel RealSense camera to be connected

## Choose the recoreded data file to run from the downloaded data.
LOCAL_DATA_FILE_PATH = "data/keller_study.npz" 

## Set device
DEVICE = 'cuda' ## 'cpu' or 'cuda'

## Select the Mode of the Product.
MODE = "NAV" 
    # NAV - Navigation assistance
    # VQA - Visual Question Answering
    # SD - Scene Descriptions

Running the code

First, activate your environment. By default, the name is nlp if the environment was created from environment.yml

conda activate nlp

Run the main python file with the following command once the environment has been activated

python3 main.py

Troubleshooting

OpenCV only can be run in the main thread in MacOS. This limitation means the code will run, but the user cannot see the camera output on MacOS.

Acknowledgements

This project uses several open-source repositories:

Jocher, G., Qiu, J., & Chaurasia, A. (2023). Ultralytics YOLO (Version 8.0.0) [Computer software]. https://github.com/ultralytics/ultralytics
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. M. (2020). Transformers: State-of-the-Art Natural Language Processing [Conference paper]. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
Intel Corporation. (2024). librealsense (Version 2.55.1). GitHub. https://github.com/IntelRealSense/librealsense
Chase, H. (2022). LangChain [Computer software]. https://github.com/langchain-ai/langchain

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
models		models
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
audio_handler.py		audio_handler.py
camera.py		camera.py
camera_input.py		camera_input.py
data_collection.py		data_collection.py
data_reader.py		data_reader.py
environment.yml		environment.yml
image_caption.py		image_caption.py
image_vqa.py		image_vqa.py
llm.py		llm.py
llm_classifier.py		llm_classifier.py
main.py		main.py
object_detector.py		object_detector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene Descriptions for the Visually Impaired

CSCI 5541 Natural Language Processing Final Project

Group Name: Sentimantals

Members: Mohit Yadav, Alex Besch, Abbas Booshehrain, Ruolei Zeng

Demo Output from our Method:

Installaton

Downloading sample data to run the code without Depth Camera

Setting Parameters

Running the code

Troubleshooting

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

mohitydv09/nlp-final-project

Folders and files

Latest commit

History

Repository files navigation

Scene Descriptions for the Visually Impaired

CSCI 5541 Natural Language Processing Final Project

Group Name: Sentimantals

Members: Mohit Yadav, Alex Besch, Abbas Booshehrain, Ruolei Zeng

Demo Output from our Method:

Installaton

Downloading sample data to run the code without Depth Camera

Setting Parameters

Running the code

Troubleshooting

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages