PhD Defense: Learning from Multiple Views of Data

Talk
Abhishek Sharma
Time: 
04.06.2015 13:00 to 15:00
Location: 

AVW 4172

In this dissertation, I take inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. I am trying to explore the hypothesis that the human brain can extract and store information from raw data in a form, termed a common representation, that is well suited for cross-modal content matching. A human-level performance for the aforementioned task has two essential requirements - a) the ability to extract sufficient information from raw data and b) algorithms to obtain a task-specific common representation from multiple sources of extracted information. This dissertation addresses the aforementioned requirements and develops novel content extraction and cross-modal content matching architectures.
The first part of the dissertation proposes a learning-based visual information extraction approach: Recursive Context Propagation Network or RCPN, for semantic segmentation of images. It is a deep feed-forward neural network that utilizes the contextual information from the entire image, through bottom-up followed by top-down context propagation via random binary parse trees. This improves the feature representation of every super-pixel in an image for better classification into semantic categories by propagating visual information from every location in the image to every other location. RCPN is analyzed to discover that the presence of bypass-error paths in the computation graph of RCPN can hinder effective context propagation. It is shown that bypass-errors can be tackled by inclusion of classification loss of internal nodes as well. Secondly, a novel tree-MRF structure is developed using the parse trees to model the hierarchical dependency present in the output.
The second part of this dissertation develops algorithms to obtain and match the common representations across different modalities. A novel framework is proposed to use Partial Least Square (PLS) as a tool to learn a common subspace from multiple modalities of data. It is used for multi-modal face biometric problems such as pose-invariant face recognition and sketch-face recognition. The issue of sensitivity to the noise in pose variation is analyzed and a two-stage discriminative model is developed to tackle it. A generalized framework is proposed to extend various popular feature extraction techniques that can be solved as a generalized eigenvalue problem to their multi-modal counterpart. It is termed Generalized Multiview Analysis or GMA, and used for pose-and-lighting invariant face recognition and text-image retrieval.
Examining Committee:
Committee Chair: - Dr. David Jacobs
Dean's Representative: - Dr. Min Wu
Committee Member(s): - Dr. Larry S. Davis
- Dr. Oncel Tuzel
- Dr. John Aloimonos