# Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

@inproceedings{Feldman2017StatisticalQA,
title={Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization},
author={Vitaly Feldman and Crist{\'o}bal Guzm{\'a}n and Santosh S. Vempala},
booktitle={SODA},
year={2017}
}
• Published in SODA 30 December 2015
• Computer Science
Stochastic convex optimization, by which the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research, and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases…

## Tables from this paper

Lower Bounds for Parallel and Randomized Convex Optimization
• Computer Science, Mathematics
COLT
• 2019
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation. We show that the answer
Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent
• Computer Science
COLT
• 2021
This paper studies two of the most popular restricted computational models, the statistical query framework and low-degree polynomials, in the context of high-dimensional hypothesis testing, and finds that under mild conditions on the testing problem, the two classes of algorithms are essentially equivalent in power.
Statistical Query Lower Bounds for List-Decodable Linear Regression
• Computer Science, Mathematics
NeurIPS
• 2021
The main result is a Statistical Query (SQ) lower bound of d, which qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.
A General Characterization of the Statistical Query Complexity
This work demonstrates that the complexity of solving general problems over distributions using SQ algorithms can be captured by a relatively simple notion of statistical dimension that is introduced, and is also the first to precisely characterize the necessary tolerance of queries.
On the Power of Learning from k-Wise Queries
• Computer Science
ITCS
• 2017
For every k, the complexity of distribution-independent PAC learning with k-wise queries can be exponentially larger than learning with (k+1)-wise queries, and the picture is substantially richer for more general problems over distributions.
Private Stochastic Convex Optimization with Optimal Rates
• Computer Science
NeurIPS
• 2019
The approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization and implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/\sqrt{n}$ as non-private SCO in the parameter regime most common in practice.
The estimation error of general first order methods
• Computer Science
COLT
• 2020
A class of `general first order methods' that aim at efficiently estimating the underlying parameters in high-dimensional asymptotically negligible terms is introduced, broad enough to include classical first order optimization (for convex and non-convex objectives), but also other types of algorithms.
Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings
• Computer Science, Mathematics
NeurIPS
• 2021
This work provides the first method for non-smooth weakly convex stochastic optimization with rate Õ ( 1 n1/4 + d 1/6 (nε)1/3 ) which matches the best existing non-private algorithm when d = O( √ n).
The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals
• Computer Science, Mathematics
COLT
• 2021
It is shown that the L-polynomial regression algorithm is essentially best possible among SQ algorithms, and therefore that the SQ complexity of agnostic learning is closely related to the polynomial degree required to approximate any function from the concept class in L-norm.
Optimal SQ Lower Bounds for Robustly Learning Discrete Product Distributions and Ising Models
• Computer Science
ArXiv
• 2022
The optimal Statistical Query lower bounds for robustly learning certain families of discrete high-dimensional distributions are established and a generic SQ lower bound is developed starting from low-dimensional moment matching constructions that are believed to find other applications.

## References

SHOWING 1-10 OF 127 REFERENCES
Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
• Computer Science
2014 IEEE 55th Annual Symposium on Foundations of Computer Science
• 2014
This work provides new algorithms and matching lower bounds for differentially private convex empirical risk minimization assuming only that each data point's contribution to the loss function is Lipschitz and that the domain of optimization is bounded.
Statistical Algorithms and a Lower Bound for Detecting Planted Cliques
• Computer Science, Mathematics
J. ACM
• 2017
The main application is a nearly optimal lower bound on the complexity of any statistical query algorithm for detecting planted bipartite clique distributions when the planted clique has size O(n1/2 − δ) for any constant δ > 0.
Information-Based Complexity, Feedback and Dynamics in Convex Programming
• Computer Science
IEEE Transactions on Information Theory
• 2011
The present work connects the intuitive notions of “information” in optimization, experimental design, estimation, and active learning to the quantitative notion of Shannon information and shows that optimization algorithms often obey the law of diminishing returns.
Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization
• Computer Science
IEEE Transactions on Information Theory
• 2012
A new notion of discrepancy between functions is introduced, and used to reduce problems of stochastic convex optimization to statistical parameter estimation, which can be lower bounded using information-theoretic methods.
Private Multiplicative Weights Beyond Linear Queries
This work shows how to give accurate and differentially private solutions to exponentially many convex minimization problems on a sensitive dataset.
Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions
• Computer Science
COLT
• 2015
We consider the problem of optimizing an approximately convex function over a bounded convex set in $\mathbb{R}^n$ using only function evaluations. The problem is reduced to sampling from an
Stochastic Convex Optimization
• Computer Science
COLT
• 2009
Stochastic convex optimization is studied, and it is shown that the key ingredient is strong convexity and regularization, which is only a sufficient, but not necessary, condition for meaningful non-trivial learnability.
Simulated Annealing for Convex Optimization
• Computer Science, Mathematics
Math. Oper. Res.
• 2006
One of the advantages of simulated annealing, in addition to avoiding poor local minima, is that in these problems it converges faster to the minima that it finds, and it is concluded that under certain general conditions, the Boltzmann-Gibbs distributions are optimal on these convex problems.
Weakly learning DNF and characterizing statistical query learning using Fourier analysis
• Computer Science
STOC '94
• 1994
It is proved that an algorithm due to Kushilevitz and Mansour can be used to weakly learn DNF using membership queries in polynomial time, with respect to the uniform distribution on the inputs, and it is obtained that DNF expressions and decision trees are not evenWeakly learnable with any unproven assumptions.
Interactive fingerprinting codes and the hardness of preventing false discovery
• Computer Science
2016 Information Theory and Applications Workshop (ITA)
• 2016
It is shown that, under a standard hardness assumption, there is no computationally efficient algorithm that, given n samples from an unknown distribution, can give valid answers to O(n2) adaptively chosen statistical queries.