PhD Proposal: Bayesian Nonparametric Models for Multimodal Data Integration

Talk
Bahadir Ozdemir
Time: 
09.24.2014 12:00 to 13:30
Location: 

AVW 4424

Integrating information from multiple input sources is critical to achieve several key tasks in machine learning. Discovering hidden common causes that explain the dependency among modalities contributes towards enhancing performance in these tasks compared to single-view approaches. We propose probabilistic integrative approaches for clustering and retrieval tasks for multimodal data in bioinformatics and computer vision.
The high tumor heterogeneity makes it very challenging to identify key tumorigenic pathways as therapeutic targets. The integration of multiple omics data is a promising approach to identify driving regulatory networks in patient subgroups. In the first part, we propose a novel conceptual framework to discover patterns of microRNA-gene networks, observed frequently up- or down-regulated in a group of patients and to use such networks for patient stratification in hepatocellular carcinoma (HCC). We use an integrative subgraph mining approach to identify altered regulatory networks frequently observed in HCC patients. Next, the microRNA-mRNA modules were used in an unsupervised class prediction model to discover HCC subgroups via patient clustering by a nonparametric Bayesian model. The Kaplan-Meier survival analysis revealed that the HCC subgroups identified by the algorithm have different survival characteristics.
Enormous amount of multimedia data which are available online leads the requirement of intelligent systems for fast image search and retrieval in large-scale collections. In the second part, we propose a multimodal retrieval procedure based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. Experiments on two multimodal datasets, PASCAL-Sentence and SUN-Attribute, demonstrate the effectiveness of the proposed retrieval procedure in comparison to the state-of-the-art algorithms for learning binary codes.
Examining Committee:
Committee Chair: - Dr. Larry S. Davis
Dept's Representative - Dr. Hal Daume' III
Committee Member(s): - Dr. Ramani Duraiswami