Countless of elderly people need constant attention to their needs. It is mighty challenging for them to engage in day to day conversations, and call for help. The VirtuSense Head-Nod Assistant is an AI powered personal assistant that keeps patients in check of their needs and necessities and also offer a peice of entertainment. An AI-powered web app , designed to interact with users through a single thought-provoking question that expects a "Yes" or "No" head-nod..
The head-nod application is structured into following components:
images/
: Folder containing deployment use casesapp.py
: The main Streamlit application.Dockerfile
: Instructions for Docker to build the application image.detetctor.py
: The main detector algorithm using mediapipe and OpenCV.utils.py
: File consisting record and save video utlity for the application.interaction.txt
: File for interaction storage with time stamp.video_logs/
: Folder containing video_log files.open_browser.sh
: The bash file for opening web browswer.open_cam.sh
: The bash file for opening camera.requirements.txt
: All the dependencies and Version.
For the application, you are required to have docker installed. To install docker, follow the instructions given on the link Docker Installation Guide.
Further ensure, you have git installed in your device OS. If not you can install git by just using the following command:
sudo apt install git-all
-
Git clone the repository First step is to clone the whole repository.
git clone https://github.com/Achuthankrishna/head_pose_app
Get into the folder
cd head_pose_app
-
Build the Docker Image In the command line first execute:
docker -v
to check your docker version. Once done , execute the command below to build the container:
sudo docker build -t head_pose_app .
-
Start the Container To start the streamlit web application, run ( Debian Systems) :
sudo docker run --privileged -v /dev/video0:/dev/video0 -p 8501:8501 head_pose_app
This command then map the port 8501 of the container to port 8501 on your host machine.
-
Click the link on the terminal Last step is to either click on the link on terminal , else type
localhost:8501
on any browser on the system.
- Open your web browser and navigate to
http://localhost:85001
or click on the link presented on the terminal. - The user will be presented with a web interface given with instructions on how to nod and key analogies.
- Click on
Start
, to start the application, else if user is not feeling the moment to use the application, they can pressQuit App
to kill the process. - Read the question on the side bar and press on
Answer Question
to open the interface, else the user wants to skip question, they can pressChange Question
. - When pressed
Answer Question
, a countdown will be presented to the user , upon which the camera frame opens up for their response. - The software first records the interaction in a 3 second window. If the detector can't detect any proper sign, it will keep prompting the user to re-record their interaction.
- At any point of time user can skip to the next question pressing
Change Question
button or can quit the app pressingQuit App
. Once the user answers the given question, the detector presents the output. - The user can either choose to re-answer if they're not satisfied with the detection , else can press continue to log their interaction for the question.
- Upon completing all Questions, the webapp will present the option of either to restart from first or quit the application. On presseing
Restart
the application starts recording response from the first question.
- The detection algorithm is majorly functioning on majority action recorded for the frame time. These actions are decided by first obtaining facial landmarks using mediapipe blendshape and FaceMeshV2 model.
- The obtained facial landmarks are indexed and I chose the nasal index and the extreme cheekbone index, which can be viewed from this image
- From the obtained indices, we record all 3d and 2d locations of the landard for each frame and use PnPSolver to get rotation and translation vector. Further we obtain rotation matrices using Rodrigues formula.
- Based on the Euler angles, the I classifies the predominant motion of the face. If the rotation about the y-axis (y) is significant (beyond a threshold), it indicates looking left or right. If the rotation about the x-axis (x) is significant, it indicates looking up or down. Otherwise, the face is considered to be facing forward.
- Then we count the occurrences of different types of motions and identify the predominant motion over the 3 second frame.Based on the predominant motion, the function returns the response, categorizing it as "Yes", "No", or "Undetermined" .
- If user encounter any issues with the video recording, ensure your Docker is configured correctly. The application is best developed for a debian device where camera device ID is accessible and not truncated.
- Any issues with dependency versions, refer to
requirements.txt
and adjust the package versions if necessary.