• Corpus ID: 14475218

From Dual to Primal Sub-optimality for Regularized Empirical Risk Minimization

  title={From Dual to Primal Sub-optimality for Regularized Empirical Risk Minimization},
  author={Ching-pei Lee},
Regularized empirical risk minimization problems are fundamental tasks in machine learning and data analysis. Many successful approaches for solving these problems are based on a dual formulation, which often admits more efficient algorithms. Often, though, the primal solution is needed. In the case of regularized empirical risk minimization, there is a convenient formula for reconstructing an approximate primal solution from the approximate dual solution. However, the question of quantifying… 
1 Citations
Large-scale logistic regression and linear support vector machines using spark
This work considers a distributed Newton method for solving logistic regression as well linear SVM and implements it on Spark, and releases an efficient and easy-to-use tool for the Spark community.


QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines
Operational conditions for which the Simon and composite algorithms possess an upper bound of O(n) on the number of iterations are described and general conditions forwhich a matching lower bound exists for any decomposition algorithm that uses working sets of size 2 are described.
Approximate Duality
The Lagrangian duality theory is extended to incorporate approximate solutions of convex optimization problems and can be used for convex quadratic programming and then applied to support vector machines from learning theory.
Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM
This paper proposes an efficient box-constrained quadratic optimization algorithm for distributedly training linear support vector machines (SVMs) with large data using an efficient method that requires only O(1) communication cost to ensure fast convergence.
Iteration complexity of feasible descent methods for convex optimization
The global linear convergence on a wide range of algorithms when they are applied to some non-strongly convex problems is proved and the first to prove O(log(1/e) time complexity of cyclic coordinate descent methods on dual problems of support vector classification and regression is proved.
Dual coordinate descent methods for logistic regression and maximum entropy models
This paper applies coordinate descent methods to solve the dual form of logistic regression and maximum entropy, and shows that many details are different from the situation in linear SVM.
Stochastic dual coordinate ascent methods for regularized loss
A new analysis of Stochastic Dual Coordinate Ascent (SDCA) is presented showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
Examining exponentiated gradient algorithms for training log-linear and maximum-margin models describes how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing.
A dual coordinate descent method for large-scale linear SVM
A novel dual coordinate descent method for linear SVM with L1-and L2-loss functions that reaches an ε-accurate solution in O(log(1/ε)) iterations is presented.
PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent
This paper proposes a family of parallel asynchronous stochastic dual coordinate descent algorithms (PASSCoDe), showing that the converged solution is the exact solution for a primal problem with a perturbed regularizer under the multi-core environment.
Introductory Lectures on Convex Optimization - A Basic Course
It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments.