#### Filter Results:

- Full text PDF available (99)

#### Publication Year

2007

2017

- This year (10)
- Last 5 years (60)
- Last 10 years (101)

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

#### Method

Learn More

- Alexander Rakhlin, Ohad Shamir, Karthik Sridharan
- ICML
- 2012

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can… (More)

- Shie Mannor, Ohad Shamir
- NIPS
- 2011

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where… (More)

- Ohad Shamir
- ICML
- 2016

We study the convergence properties of the VR-PCA algorithm introduced by [19] for fast computation of leading singular vectors. We prove several new results, including a formal analysis of a block version of the algorithm, and convergence from random initialization. We also make a few observations of independent interest, such as how pre-initializing with… (More)

- Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, Karthik Sridharan
- Journal of Machine Learning Research
- 2010

The problem of characterizing learnability is the most basic question of statistical learning theory. A fundamental and long-standing answer, at least for the case of supervised classification and regression, is that learnability is equivalent to uniform convergence of the empirical risk to the population risk, and that if a problem is learnable, it is… (More)

- Ohad Shamir, Tong Zhang
- ICML
- 2013

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochas-tic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector… (More)

For supervised classification problems, it is well known that learnability is equivalent to uniform convergence of the empirical risks and thus to learnability by empirical minimization. Inspired by recent regret bounds for online convex optimization , we study stochastic convex optimization , and uncover a surprisingly different situation in the more… (More)

- Shai Shalev-Shwartz, Alon Gonen, Ohad Shamir
- ICML
- 2011

We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) finding the left and right singular vectors corresponding… (More)

- Roi Livni, Shai Shalev-Shwartz, Ohad Shamir
- NIPS
- 2014

It is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), over-specification (i.e., train networks which are larger than needed), and regularization. In this paper we… (More)

- Ohad Shamir, Nathan Srebro, Tong Zhang
- ICML
- 2014

We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which prov-ably improves with the data size, requiring an essentially constant number of iterations under reasonable assumptions. We… (More)

- Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao
- Journal of Machine Learning Research
- 2012

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a… (More)