Learn More
We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate(More)
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochas-tic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine(More)
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated(More)
We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full Frank-Wolfe algorithm. We also show that, when applied to the dual structural support vector machine(More)
We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We(More)
Previous work has examined structure learning in log-linear models with`1-regularization, largely focusing on the case of pairwise potentials. In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint. We enforce the hierarchical constraint using group`1-regularization with overlapping groups.(More)
Sparsity-promoting L1-regularization has recently been suc-cesfully used to learn the structure of undirected graphical models. In this paper, we apply this technique to learn the structure of directed graphical models. Specifically, we make three contributions. First, we show how the decomposability of the MDL score, plus the ability to quickly compute(More)
Supervised learning from multiple labeling sources is an increasingly important problem in machine learning and data mining. This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts(More)
Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this problem, the interactions between the different heart regions(More)
On iteration k, Algorithm S has an error of 1/k. On iteration k, Algorithm D has an error of 1/2 k. On iteration k, Algorithm S has an error of 1/k. On iteration k, Algorithm D has an error of 1/2 k. Stochastic vs. Deterministic: Stochastic makes great progress initially, but slows down. Determinstic makes steady progress, but is expensive.