Your browser is unsupported

We recommend using the latest version of IE11, Edge, Chrome, Firefox or Safari.

Mehrdad Alizadeh defends his PhD thesis

Congratulations to Mehrdad Alizadeh for successfully defending his thesis (virtually!) entitled "Enhancing Visual Question Answering with Linguistic Information" on May 22, 2020!

His committee included Barbara Di Eugenio (advisor; CS, UIC), Natalie Parde (CS, UIC), Cornelia Caragea (CS, UIC), Brian Ziebart (CS, UIC), and Ahmet Enis Cetin (Department of Electrical & Computer Engineering, UIC).

Abstract:
Visual Question Answering (VQA) concerns providing answers to natural language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, given a complex free form question the language understanding component becomes crucial. In this work, I hypothesize that if the question focuses on events described by verbs, then the model should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. My first contribution is a new VQA dataset (imSituVQA) that I built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, I propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. The
experiments on imSituVQA show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance.
Semantic role labeling is an alternative solution to approximately annotate any VQA dataset of interest. I employed a PropBank based semantic role labeler to label a subset of the VQA dataset (VQAsub). Then I trained the proposed multi-task CNN-LSTM model with VQAsub. The results show a slight improvement over the single-task CNN-LSTM model.