Clayton Scott, Associate Professor, UMich Ann Arbor
Date and Time: Nov 14, 2012 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building
In many real-world classification problems, the labels of training examples are randomly corrupted. That is, the set of training examples for each class is contaminated by examples of the other class. Existing approaches to this problem assume that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. We introduce a general framework for classification with label noise that eliminates these assumptions. In particular, we identify necessary and sufficient distributional assumptions for the existence of a consistent estimator of the optimal risk, with associated estimation strategies. We find that learning in the presence of label noise is possible even when the class-conditional distributions overlap and the label noise is not symmetric. A key to our approach is a universally consistent estimator of the maximal proportion of one distribution that is present in another, or equivalently, of the so-called "separation distance" between two distributions.
The methodology is motivated by a problem in nuclear particle classification. Connections to domain adaptation and learning with positive and unlabeled examples will also be given.
Clayton Scott is an Associate Professor of EECS and of Statistics at the University of Michigan. He received his AB in Mathematics from Harvard in 1998, and his MS and PhD in Electrical Engineering from Rice in 2000 and 2004. His research interests include statistical learning theory, algorithms, and applications.