The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group


Optimization over nonconvex constraints & Gradient Coding via Sparse Random Graphs

Wooseok Ha & Zachary Charles, Graduate Student - University of Chicago & Graduate Student - University of Wisconsin

Date and Time: Sep 20, 2017 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building


Many problems in modern statistics can be formulated as an optimization problem with structured constraints, where the constraints often exhibit nonconvexity such as sparsity or low rank. However, working with nonconvex constraints presents challenges from both a theoretical and practical point of view. In this talk, we discuss a convergence behavior on two widely used algorithms, projected gradient descent and alternating minimization method, in the presence of nonconvex constraints. A major tool allowing to handle the nonconvex constraints is the {\em local concavity coefficient}, which aims to measure the concavity of a general nonconvex set. In the setting of alternating minimization, our result further reveals important distinction between alternating and non-alternating methods. We demonstrate our framework on a range of specific examples with rank-constrained variables, including factor model and multitask regression.


The large-scale nature of modern machine learning makes distributed computation indispensable. Gradient-based optimization methods have the potential for massive speed-ups in distributed setups. Unfortunately, distributed machine learning is often beset by computational bottlenecks. One such bottleneck is the presence of straggler nodes, worker nodes that run considerably slower than others. Recently, gradient coding, the use of coding-theoretic techniques for straggler mitigation, has been proposed. While prior work has mainly focused on on exact reconstruction of the desired output, slightly inexact computations can be acceptable in applications that are robust to noise, such as distributed model training. We will present a gradient coding scheme based on sparse random graphs that guarantees fast and approximately accurate distributed computations, especially for gradient-based algorithms. We show that by sacrificing a small amount of accuracy, we can greatly increase robustness to stragglers.