The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group


with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

Northwestern Mutual

Non-exhaustive, overlapping k-means clustering

David Gleich, Assistant Professor, Computer Sciences, Purdue University

Date and Time: Nov 18, 2015 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building


Optimizing the k-means objective with Lloyd's algorithm is a classic strategy to separate a set of datapoint into individual clusters. More recent data in biology and social networks necessitates overlapping clustering. I'll present a generalization of the k-means objective function that incorporates overlapping partitions as well as outlier detection. A weighted, kernel-based variation on this objective is also equivalent to the normalized cut objective on graphs. We'll see a few algorithms to optimize this objective function including a generalization of Lloyd's algorithm, a semi-definite relaxation, and a variety of solvers for a non-convex low-rank SDP heuristic. These methods produce state of the art accuracy results on clustering problems where ground-truth clusters are known.