PhD Proposal: Towards Common Sense through Synthesizing Cues for Prediction
We seek to advance progress on an obstinately difficult, yet central, challenge within AI and CV: common sense understanding. We argue that a central capacity underlying human common sense is the ability to understand the temporal structure of events, and to project into the near future to likely future events. This ability to understand the near-term future is necessary to interact dynamically, fluidly, and competently both with the world and with other agents. Additionally, humans, unlike most contemporary methods in AI an CV, are adept at synthesizing information sources - we argue that this ability for synthesis of multiple sources of information is also central to human common sense. As such, we introduce methods for the understanding and projection of human action into the near future, synthesizing multiple sources and representations of the observed world. We leverage all of: understanding of the characteristics of human motion; understanding of intention betrayed through human gaze fixation; understanding of the temporal structure of activities through relations of constituent actions and their characteristics. We provide to the community a tool that produces probability distributions over likely hand trajectory destinations given information on trajectory positions and gaze fixations for partial trajectories. And, we introduce and evaluate novel methods for predicting near term future actions of activities, based on all of characteristics of motion, gaze fixations, and semantics of actions.
Chair: Dr. Yiannis Aloimonos Dept. rep: Dr. Dinesh Manocha Members: Dr. Cornelia Fermuller