Privacy, Stability and Generalization

Talk
Adam Smith
Penn State
Time: 
04.20.2017 11:00 to 12:00
Location: 

AVW 2460

Consider an agency holding a large database of sensitive personal information -- medical records, census survey answers, web search records, or genetic data, for example. The agency would like to discover and publicly release global characteristics of the data while protecting the privacy of individuals' records.
I will begin by discussing what makes this problem difficult, illustrating some challenges via recent work on membership inference attacks. Motivated by this, I will present differential privacy, a rigorous definition of privacy in statistical databases that is now widely studied, and increasingly used to analyze and design deployed systems.
Finally, I will explain how differential privacy is connected to a seemingly different problem: understanding statistical validity in "adaptive data analysis", the practice by which insights gathered from data are used to inform further analysis of the same data set. I'll show how limiting the information revealed about a data set during analysis allows one to control bias, and why differential privacy provides a particularly useful tool for limiting revealed information.