Interactive systems for code and data demography

Talk
Elena Glassman
University of California, Berkeley
Talk Series: 
Time: 
04.03.2018 11:00 to 12:00
Location: 

AVW 4172

Programming—the means by which we tell computers what to do—has changed a lot over time. Programming today means programming alongside hundreds of fellow students, thousands of fellow professional software engineers at a particular company, or millions of fellow developers in the open-source community sharing their code online. In this talk, I will describe several interactive systems I have built that exploit the structure within large volumes of peer-produced code to help communities of programmers learn about, reflect on, and teach how to write more correct, readable code. These systems are made possible by code demography, which I define as statistics, algorithms, and visualizations that help people comprehend and interact with population-level structure and trends in large code corpora. The key to my approach is designing or inferring abstractions that capture critical features and abstract away variation that is irrelevant to the user. Code demography can reveal strategically diverse sets of aligned code examples which, according to theories of human concept learning, help people learn, i.e., construct mental abstractions that generalize well. I will focus this talk on two families of systems that use program analysis, program synthesis, and visualization to either power active data-driven teaching in large programming classrooms or passive knowledge sharing within developer communities. Some of these systems have been integrated into UC Berkeley’s largest introductory programming class, which regularly enrolls over 1500 students. I will conclude with my vision for how the techniques of code demography can be generalized to more types of large complex data corpora and enable new data-driven programming paradigms.