The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group


with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

Northwestern Mutual

Learning with Aggregated Data; a Tale of Two Approaches

Sanmi Koyejo,

Date and Time: Oct 18, 2017 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building


For many applications in healthcare, econometrics, financial forecasting and climate science, data can only be obtained as aggregates. This begs the question, can one construct accurate models using only aggregates? I will present two vignettes outlining recent work towards an answer.

First, consider a sparse linear model learned from IID data aggregated into groups, where only empirical moments of each group are observed. Despite this obfuscation of individual data values, we show that subject to standard conditions, the parameter is recoverable with high probability using standard algorithms. Second, consider learning with aggregated correlated data such as time series or spatial data. Here, standard techniques fail. Instead, we propose a simple procedure which exploits Fourier transforms and achieves strong generalization error guarantees. In both settings, empirical evaluation on datasets from healthcare, agricultural studies, ecological surveys and climate science are presented to demonstrate efficacy.

Joint work with Avradeep Bhowmik and Joydeep Ghosh.