Making Machines More Human: A Multitask Learning Approach to VQA and Human Attention Prediction

My MSc Dissertation Project had the goal of developing a Deep Learning algorithm capable of improving VQA performance of a state-of-the-art architecture while mimicking human attention, using the VQA-HAT dataset. The Project was successful and some results can be found below.

The Dissertation PDF can be found in this repository - msc-dissertation.pdf.

Code adapted from Stacked attention networks for image question answering.

Dependencies

The code is in python and uses Theano package.

Python 2.7
Theano
Numpy
h5py

Usage

To train a model,

cd src/scripts; python mtl_san_deepfix.py

There is another README.md inside src describing the files there.

Results

Some results can be found below. "Human Attention and Answer" is the ground-truth. "SAN" is our main baseline - the Stacked Attention Network. Our main algorithm is "MTL SAN+DeepFix", able to improve VQA accuracy of our baseline SAN, while mimicking human attention. Remaining models are different baselines. Thorough explanations can be found in msc-dissertation.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 379 Commits
data_att_maps		data_att_maps
data_vqa		data_vqa
src		src
.gitignore		.gitignore
README.md		README.md
msc-dissertation.pdf		msc-dissertation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Making Machines More Human: A Multitask Learning Approach to VQA and Human Attention Prediction

Dependencies

Usage

Results

About

Languages

goncalomcorreia/vqa_human_attention

Folders and files

Latest commit

History

Repository files navigation

Making Machines More Human: A Multitask Learning Approach to VQA and Human Attention Prediction

Dependencies

Usage

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages