Matthew Malloy, Principal Data Scientist, comScore
Date and Time: May 06, 2015 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building
Identifying contamination in datasets is important in a wide variety of settings, including view and click fraud in online advertising. After a brief overview of digital ad fraud, I’ll describe a technique for estimating contamination in large, categorical datasets. The technique involves solving a series of convex programs, resulting in a bound on the minimum number of data points that must be discarded (i.e, the level of contamination) from an empirical data set in order to match a model to within a specified goodness-of-fit, controlled by a p-value. I’ll discuss convergence guarantees, provide geometric interpretations, and highlight practical aspects of solving over a million convex optimizations nightly.