The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group

with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

# Covariance thresholding, kernel random matrices and sparse PCA

Yash Deshpande, Department of Electrical Engineering, Stanford University

Date and Time: May 15, 2015 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building

## Abstract:

In Sparse Principal Component Analysis (PCA) we wish to reconstruct a low-rank matrix from noisy observations, under sparsity assumptions on the factors recovered. Johnstone and Lu (2004) formalized these assumptions in the 'spiked covariance model', wherein we observe $n$ i.i.d. samples from a $p$ dimensional Gaussian distribution $N(0, I + b v v^t)$. The population principal component $v$ is called the 'spike' and is assumed to be sparse in a certain basis. Johnstone and Lu also proposed a simple scheme that consistently estimates the support of $v$ provided its size is smaller than $\sqrt{n/\log p}$.

Despite a considerable amount of work over the past ten years, there was no computationally efficient scheme that improved over this guarantee. We analyze Covariance Thresholding, a simple algorithm proposed by Krautgamer, Nadler and Vilenchik (2013) and prove that it recovers supports up to size $\sqrt{n}$. Under a certain complexity conjecture, it is impossible to significantly improve this guarantee.

The main technical component in our analysis involves a new bound on the spectral norm of kernel random matrices.

This is joint work with Andrea Montanari.