Skip to content

mohitydv09/nlp-final-project

Repository files navigation

Scene Descriptions for the Visually Impaired

CSCI 5541 Natural Language Processing Final Project

Group Name: Sentimantals

Members: Mohit Yadav, Alex Besch, Abbas Booshehrain, Ruolei Zeng

For summary or our work please visit our Project WebPage here.

For detailed analysis and methodology of our work, see the project final report here.

Demo Output from our Method:

banner_video.mp4

Installaton

This code was tested with Python 3.12.

Clone this repo in your machine by running the following command in terminal:

git clone https://github.com/mohitydv09/nlp-final-project.git
cd nlp-final-project

Create the conda env

conda env create -f environment.yml

This repo uses OpenAI's ChatGPT for inference and hence OpenAI API Key is required to be stored as evn variable OPENAI_API_KEY.

This can be done via following command:

export OPENAI_API_KEY="YOUR OPENAI API KEY"

Check the correct setting of the evn variable by running:

echo $OPENAI_API_KEY

Downloading sample data to run the code without Depth Camera

Download the test data from here and copy it into the data/ folder.

Note: The code will automatically download the object detection model and the Vision Language model locally.

Setting Parameters

The following variables control how the code fuctions. Open the file main.py and adjust accordingly:

## Choose the OpenAI's LLM Model to be used
LLM_MODEL_NAME = 'gpt-4o-mini'

## Set Model Temperature
LLM_TEMPERATURE = 0.0 ## Deterministic

## Set the data input stream
WORKING_WITH_LOCAL_DATA = True 
    # True - Uses local data in ./data folder
    # False - Requires Intel RealSense camera to be connected

## Choose the recoreded data file to run from the downloaded data.
LOCAL_DATA_FILE_PATH = "data/keller_study.npz" 

## Set device
DEVICE = 'cuda' ## 'cpu' or 'cuda'

## Select the Mode of the Product.
MODE = "NAV" 
    # NAV - Navigation assistance
    # VQA - Visual Question Answering
    # SD - Scene Descriptions

Running the code

First, activate your environment. By default, the name is nlp if the environment was created from environment.yml

conda activate nlp

Run the main python file with the following command once the environment has been activated

python3 main.py

Troubleshooting

OpenCV only can be run in the main thread in MacOS. This limitation means the code will run, but the user cannot see the camera output on MacOS.

Acknowledgements

This project uses several open-source repositories:

About

Repo for the final project of NLP

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages