Data Sampling Affects the Complexity of Online SGD over Dependent Data

@article{Ma2022DataSA,
  title={Data Sampling Affects the Complexity of Online SGD over Dependent Data},
  author={Shaocong Ma and Ziyi Chen and Yi Zhou and Kaiyi Ji and Yingbin Liang},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.00006}
}
Conventional machine learning applications typi-cally assume that data samples are independently and identically distributed (i.i.d.). However, practical scenarios often involve a data-generating process that produces highly dependent data samples, which are known to heavily bias the stochastic optimization process and slow down the convergence of learning. In this paper, we conduct a fundamental study on how different stochastic data sampling schemes affect the sample complexity of online… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 43 REFERENCES

The Generalization Ability of Online Algorithms for Dependent Data

It is shown that the generalization error of any stable online algorithm concentrates around its regret-an easily computable statistic of the online performance of the algorithm-when the underlying ergodic process is β- or φ -mixing.

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data

This work provides a non-asymptotic analysis of the convergence of various SG-based methods; this includes the famous SG descent, constant and time-varying mini-batch SG methods, and their averaged estimates (a.k.a. Polyak-Ruppert averaging).

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

An algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate is proposed and serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides the first theoretical analyses to support the use of experience replay in practice.

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

This is the first theoretical study establishing that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.

Large-Scale Machine Learning with Stochastic Gradient Descent

A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.

Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

This work sharpen the sample complexity of synchronous Q-learning to the order of |S||A| (1−γ)4ε2 (up to some logarithmic factor) for any 0 < ε < 1, leading to an order-wise improvement in 1 1−γ .

Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

  • Lin Xiao
  • Computer Science
    J. Mach. Learn. Res.
  • 2009
A new online algorithm is developed, the regularized dual averaging (RDA) method, that can explicitly exploit the regularization structure in an online setting and can be very effective for sparse online learning with l1-regularization.

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

This work provides the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d. sample path and linear function approximation, and proposes a TDC algorithm with blockwisely diminishing stepsize that converges as fast as TDCunder constant stepsize, and still enjoys comparable accuracy as T DC under diminishing stepsizing.

Finite-Sample Analysis for SARSA with Linear Function Approximation

A novel technique to explicitly characterize the stochastic bias of a type of stochastics approximation procedures with time-varying Markov transition kernels is developed, which enables non-asymptotic convergence analyses of this type of Stochastic approximation algorithms, which may be of independent interest.

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

This paper proves that neural Q-learning finds the optimal policy with O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations.