#### Filter Results:

- Full text PDF available (14)

#### Publication Year

2014

2017

- This year (8)
- Last 5 years (14)
- Last 10 years (14)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

First-order methods play a central role in large-scale convex optimization. Even though many variations exist, each suited to a particular problem form, almost all such methods fundamentally rely on two types of algorithmic steps and two corresponding types of analysis: gradient-descent steps, which yield primal progress, and mirror-descent steps, which… (More)

- Zeyuan Allen-Zhu, Yuanzhi Li
- ICML
- 2017

We study k-GenEV, the problem of finding the top k generalized eigenvectors, and k-CCA, the problem of finding the top k vectors in canonicalcorrelation analysis. We propose algorithms LazyEV and LazyCCA to solve the two problems with running times linearly dependent on the input size and on k. Furthermore, our algorithms are doubly-accelerated: our running… (More)

We study k-SVD that is to obtain the first k singular vectors of a matrix A. Recently, a few breakthroughs have been discovered on k-SVD: Musco and Musco [19] proved the first gap-free convergence result using the block Krylov method, Shamir [21] discovered the first variance-reduction stochastic method, and Bhojanapalli et al. [7] provided the fastest… (More)

- Zeyuan Allen-Zhu, Rati Gelashvili, Silvio Micali, Nir Shavit
- Proceedings of the National Academy of Sciences…
- 2014

Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory… (More)

- Zeyuan Allen-Zhu
- 2016

We introduce Katyusha, the first direct, primal-only stochastic gradient method that has a provably accelerated convergence rate in convex optimization. In contrast, previous methods are based on dual coordinate descent which are more restrictive, or based on outer-inner loops which make them “blind” to the underlying stochastic nature of the optimization… (More)

- Zeyuan Allen-Zhu, Yuanzhi Li
- ICML
- 2017

The online problem of computing the top eigenvector is fundamental to machine learning. The famous matrix-multiplicative-weight-update (MMWU) framework solves this online problem and gives optimal regret. However, since MMWU runs very slow due to the computation of matrix exponentials, researchers proposed the follow-the-perturbed-leader (FTPL) framework… (More)

- Zeyuan Allen-Zhu, Yuanzhi Li
- ICML
- 2017

We solve principle component regression (PCR) by providing an efficient algorithm to project any vector onto the subspace formed by the top principle components of a matrix. Our algorithm does not require any explicit construction of the top principle components, and therefore is suitable for large-scale PCR instances. Specifically, to project onto the… (More)

- Zeyuan Allen-Zhu
- 2017

Given a nonconvex function f(x) that is an average of n smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue −σ of the Hessian. This parameter σ captures how strongly nonconvex f(x) is, and is analogous to the strong convexity… (More)

- Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, Yining Wang
- ICML
- 2017

We consider computationally tractable methods for the experimental design problem, where k out of n design points of dimension p are selected so that certain optimality criteria are approximately satisfied. Our algorithm finds a (1 + ε)approximate optimal design when k is a linear function of p; in contrast, existing results require k to be super-linear in… (More)

- Zeyuan Allen-Zhu
- ArXiv
- 2017

We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε−3.25) backpropagations. The best result was essentially O(ε−4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε−3.25), with only oracle access to stochastic gradients and Hessian-vector products.… (More)