Varun Jog ,
Date and Time: Feb 07, 2018 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building
In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. We overview some recent work [Xu and Raginsky (2017)] that bounds generalization error of empirical risk minimization based on the mutual information I(S;W) between the algorithm input S and the algorithm output W. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure, such as stochastic gradient Langevin dynamics (SGLD). We describe certain shortcomings of mutual information-based bounds, and propose alternate bounds that employ the Wasserstein metric from optimal transport theory. We compare the Wasserstein metric-based bounds with the mutual information-based bounds and show that for a class of data generating distributions, the former leads to stronger
bounds on the generalization error.
This is joint work with Adrian Tovar-Lopez, Ankit Pensia, and Po-Ling Loh.