SILO



The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group

3M

with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

Northwestern Mutual

Large-scale Problems on a Tight Budget

Ioannis Mitliagkas, University of Texas at Austin

Date and Time: Mar 25, 2015 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building

Abstract:

For all problems we provide some sort of guarantee that is much better than worst-case. For the first part of this talk, I will present our work on streaming, memory-limited Principal Component Analysis. Therein, we give the first finite-sample, global-convergence guarantees for an algorithm that solves PCA in the single-pass streaming setting. The algorithm is shown to exhibit optimal sample complexity, within logarithmic factors, under the spiked covariance model. It uses optimal memory, when the true components are dense. Then, motivated by storage and network capacity limitations, I will propose a method for fast PageRank approximations on graph engines. Our system uses random walks as a quantized alternative to the power iteration, eliminates the need of global jumps for PageRank, and introduces randomized synchronization. This last contribution is a novel extension to the GraphLab engine that could be useful to other workloads as well, as it reduces network usage. We achieve a very good approximation and a serious speedup (7-10x) over the vanilla system. Our theory bounds the resulting approximation error. Finally, I will discuss a new method for finding dense subgraphs in very large graphs; a significant improvement over the previous state of the art. The problem of the densest k-subgraph is NP-hard and hard to approximate. Our algorithm furnishes a k-subgraph, along with a graph-dependent guarantee that tells us that, in most real graphs we test, we achieve at least 70% of the optimal density. It scales almost linearly in the size of the graph and parallelizes well. Our MapReduce implementation works on billion-edge graphs.