PhD Proposal: Learning Representations with Limited Supervision and Fast Inference Through Dynamic Computation
The exponential growth of visual media, in the forms of images and videos, has encouraged the development of systems that can perform automated understanding of visual data both effectively and efficiently. Recent advances in computer vision tasks like image recognition and object detection have been driven by high-capacity deep neural networks, particularly Convolutional Neural Networks (CNNs) with hundreds of layers, trained in a supervised manner with clean and massive human annotations. However, this poses two significant challenges: (1) the need to collect millions of human labeled samples for training prevents such approaches to scale, especially for fine-grained image understanding like semantic segmentation, where dense annotations are extremely expensive to obtain; (2) the increased depth in CNNs that leads to significant improvements over competitive benchmarks at the same time, limits their deployment in real-world scenarios due to high computational cost, especially for applications on mobile devices and delay-sensitive systems, where inputs need to be processed in real-time as they arrive.In this proposal, we propose approaches to tackle these two challenges, particularly focusing on: (1) deriving robust representations with minimal human supervision through exploring context relationships or using shared information across domains; (2) investigating dynamic computation frameworks that adaptively allocate computing resources on-the-fly given a novel image with an aim to manage the trade-off between accuracies and computational complexity. We will also discuss future directions along these lines.
Chair: Dr. Larry S. Davis Dept. rep: Dr. Rama Chellappa Members: Dr. Tom Goldstein