Learn More
We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, our method yields new analysis and algorithms. We also derive specific(More)
In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional — the flip probability depends on the class. We provide two(More)
We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This “subset selection” problem finds application in a variety of areas. In the authors’ previous work (Kalyanakrishnan & Stone, 2010), this problem is framed(More)
We describe and analyze two stochastic methods for <i>l</i><sub>1</sub> regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only uses a single training example at each iteration. In both methods, the choice of(More)
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as "bandit" feedback, in the spirit of multi-armed bandit models) with respect to(More)
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H,(More)
This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes, which directly lead to a number of generalization bounds. This derivation provides simplified proofs of a number of corollaries(More)
We establish an excess risk bound of Õ ( HRn + √ HL∗Rn ) for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity Rn, where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = √ R/n, this translates to a learning rate of Õ (RH/n) in the separable (L∗ = 0) case and Õ ( RH/n+ √(More)
There is growing body of learning problems for which it is natural to organize the parameters into matrix, so as to appropriately regularize the parameters under some matrix norm (in order to impose some more sophisticated prior knowledge). This work describes and analyzes a systematic method for constructing such matrix-based, regularization methods. In(More)