Zeyuan Allen-Zhu

Learn More
First-order methods play a central role in large-scale convex optimization. Even though many variations exist, each suited to a particular problem form, almost all such methods fundamentally rely on two types of algorithmic steps and two corresponding types of analysis: gradient-descent steps, which yield primal progress, and mirror-descent steps, which(More)
We study k-GenEV, the problem of finding the top k generalized eigenvectors, and k-CCA, the problem of finding the top k vectors in canonicalcorrelation analysis. We propose algorithms LazyEV and LazyCCA to solve the two problems with running times linearly dependent on the input size and on k. Furthermore, our algorithms are doubly-accelerated: our running(More)
Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory(More)
The online problem of computing the top eigenvector is fundamental to machine learning. The famous matrix-multiplicative-weight-update (MMWU) framework solves this online problem and gives optimal regret. However, since MMWU runs very slow due to the computation of matrix exponentials, researchers proposed the follow-the-perturbed-leader (FTPL) framework(More)
We solve principle component regression (PCR) by providing an efficient algorithm to project any vector onto the subspace formed by the top principle components of a matrix. Our algorithm does not require any explicit construction of the top principle components, and therefore is suitable for large-scale PCR instances. Specifically, to project onto the(More)
Given a nonconvex function f(x) that is an average of n smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue −σ of the Hessian. This parameter σ captures how strongly nonconvex f(x) is, and is analogous to the strong convexity(More)
We consider computationally tractable methods for the experimental design problem, where k out of n design points of dimension p are selected so that certain optimality criteria are approximately satisfied. Our algorithm finds a (1 + ε)approximate optimal design when k is a linear function of p; in contrast, existing results require k to be super-linear in(More)
We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε−3.25) backpropagations. The best result was essentially O(ε−4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε−3.25), with only oracle access to stochastic gradients and Hessian-vector products.(More)