Effective sketching methods for value function approximation
@article{Pan2017EffectiveSM, title={Effective sketching methods for value function approximation}, author={Yangchen Pan and Erfan Sadeqi Azer and Martha White}, journal={ArXiv}, year={2017}, volume={abs/1708.01298} }
High-dimensional representations, such as radial basis function networks or tile coding, are common choices for policy evaluation in reinforcement learning. Learning with such high-dimensional representations, however, can be expensive, particularly for matrix methods, such as least-squares temporal difference learning or quasi-Newton methods that approximate matrix step-sizes. In this work, we explore the utility of sketching for these two classes of algorithms. We highlight issues with…
10 Citations
Efficient policy evaluation by matrix sketching
- Computer ScienceFrontiers of Computer Science
- 2022
A variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically is proposed and employed to accelerate least-square TD and quasi-Newton TD algorithms.
Two-Timescale Networks for Nonlinear Value Function Approximation
- Computer ScienceICLR
- 2019
This work provides a two-timescale network (TTN) architecture that enables linear methods to be used to learn values, with a nonlinear representation learned at a slower timescale, and proves convergence for TTNs.
Vector Step-size Adaptation for Continual, Online Prediction
- Computer Science
- 2019
An instance of AdaGain is introduced, which combines meta-descent with RMSProp, which is particularly robust across several prediction problems and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.
Context-Dependent Upper-Confidence Bounds for Directed Exploration
- Computer ScienceNeurIPS
- 2018
This work provides a novel, computationally efficient, incremental exploration strategy, leveraging this property of least-squares temporal difference learning (LSTD), and derives upper confidence bounds on the action-values learned by LSTD, with context-dependent noise variance.
Supervised autoencoders: Improving generalization performance with unsupervised regularizers
- Computer ScienceNeurIPS
- 2018
This work theoretically and empirically analyze and provides a novel generalization result for linear auto-encoders, proving uniform stability based on the inclusion of the reconstruction error in a neural network that predicts both inputs (reconstruction error) and targets jointly.
Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces
- Computer ScienceIJCAI
- 2018
This work proposes a new algorithm, LSTD(lambda)-RP, which leverages random projection techniques and takes eligibility traces into consideration to tackle policy evaluation with linear function approximation and can achieve better performances than prior LSTD-RP and LSTD (lambda) algorithms.
Learning Macroscopic Brain Connectomes via Group-Sparse Factorization
- Computer ScienceNeurIPS
- 2019
This work develops an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem, and shows that this greedy algorithm significantly improves on a standard greedy algorithm, called Orthogonal Matching Pursuit.
Target Position and Safety Margin Effects on Path Planning in Obstacle Avoidance
- Psychology
- 2021
It is found that the right and left safety margins combined to account for 26% of the variability in path planning decision making and gaze analysis findings showed that participants directed their gaze to minimize the uncertainty involved in successful task performance.
Target position and avoidance margin effects on path planning in obstacle avoidance
- PsychologyScientific reports
- 2021
Gaze analysis findings showed that participants directed their gaze to minimize the uncertainty involved in successful task performance and that gaze sequence changed with obstacle location, and an integrated explanation for path selection was provided.
Meta-descent for Online, Continual Prediction
- Computer ScienceAAAI
- 2019
A general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp, is derived.
References
SHOWING 1-10 OF 35 REFERENCES
Sketch-Based Linear Value Function Approximation
- Computer ScienceNIPS
- 2012
This work investigates the application of the tug-of-war sketch, an unbiased estimator for approximating inner products, to linear value function approximation in reinforcement learning and provides empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained.
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
- Computer ScienceNIPS
- 1995
It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.
Sketching as a Tool for Numerical Linear Algebra
- Mathematics, Computer ScienceFound. Trends Theor. Comput. Sci.
- 2014
This survey highlights the recent advances in algorithms for numericallinear algebra that have come from the technique of linear sketching, and considers least squares as well as robust regression problems, low rank approximation, and graph sparsification.
Efficient Second Order Online Learning by Sketching
- Computer ScienceNIPS
- 2016
Sketched Online Newton is an enhanced version of the Online Newton Step, an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data and further develops sparse forms of the sketching methods, making the computation linear in the sparsity of features.
Improved Practical Matrix Sketching with Guarantees
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2016
This paper attempts to categorize and compare the most known methods under row-wise streaming updates with provable guarantees, and then to tweak some of these methods to gain practical improvements while retaining guarantees.
Generalization in Reinforcement Learning: Safely Approximating the Value Function
- Computer ScienceNIPS
- 1994
Grow-Support is introduced, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization, and which is not robust, and in even very benign cases, may produce an entirely wrong policy.
Compressed Least-Squares Regression on Sparse Spaces
- Computer ScienceAAAI
- 2012
This paper develops the bias-variance analysis of a least-squares regression estimator in compressed spaces when random projections are applied on sparse input signals and shows how the choice of the projection size affects the performance of regression on compressed spaces.
Compressed Least-Squares Regression
- Computer Science, MathematicsNIPS
- 2009
It is shown that solving the problem in the compressed domain instead of the initial domain reduces the estimation error at the price of an increased (but controlled) approximation error.
Least-Squares Temporal Difference Learning
- Computer ScienceICML
- 1999
This paper presents a simpler derivation of the LSTD algorithm, which generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting algorithm is shown to be a practical formulation of supervised linear regression.