The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group


with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

Northwestern Mutual

Quantifying Ad Fraud - Contamination Estimation via Convex Relaxations

Matthew Malloy, Principal Data Scientist, comScore

Date and Time: May 06, 2015 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building


Identifying contamination in datasets is important in a wide variety of settings, including view and click fraud in online advertising. After a brief overview of digital ad fraud, I’ll describe a technique for estimating contamination in large, categorical datasets. The technique involves solving a series of convex programs, resulting in a bound on the minimum number of data points that must be discarded (i.e, the level of contamination) from an empirical data set in order to match a model to within a specified goodness-of-fit, controlled by a p-value. I’ll discuss convergence guarantees, provide geometric interpretations, and highlight practical aspects of solving over a million convex optimizations nightly.