• Corpus ID: 239769026

Online estimation and control with optimal pathlength regret

@inproceedings{Goel2021OnlineEA,
  title={Online estimation and control with optimal pathlength regret},
  author={Gautam Goel and Babak Hassibi},
  booktitle={L4DC},
  year={2021}
}
A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent “pathlength” regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits. We… 

Figures from this paper

Measurement-Feedback Control with Optimal Data-Dependent Regret

It is shown that no measurement-feedback controller can have bounded competitive ratio or regret which is bounded by the pathlength of the measurement disturbance, and proposed control algorithms derive a controller whose regret has optimal dependence on the joint energy of the driving and measurement disturbances.

References

SHOWING 1-10 OF 18 REFERENCES

Regret-optimal Estimation and Control

This work shows that the regret-optimal estimators and controllers can be derived in state-space form using operator-theoretic techniques from robust control and presents tight, data-dependent bounds on the regret incurred by the algorithms in terms of the energy of the disturbances.

Regret-optimal measurement-feedback control

It is shown that in the measurement-feedback setting, unlike in the full-information setting, there is no single offline controller which outperforms every other offline controller on every disturbance, and a new $H_2$-optimal offline controller is proposed as a benchmark for the online controller to compete against.

Adaptive Regret for Control of Time-Varying Dynamics

An efficient algorithm is given that attains first-order adaptive regret guarantees for the setting of online convex optimization with memory, and it is shown that these first- order bounds are nearly tight.

An Online Algorithm for Smoothed Regression and LQR Control

The generality of the OBD framework can be used to construct competitive algorithms for a variety of online problems across learning and control, including online variants of ridge regression, logistic regression, maximum likelihood estimation, and LQR control.

Regret-Optimal Filtering

The regret-optimal estimator is the causal estimator that minimizes the worst-case regret across all bounded-energy noise sequences and is represented as a finite-dimensional state-space whose parameters can be computed by solving three Riccati equations and a single Lyapunov equation.

Non-stationary Online Learning with Memory and Non-stochastic Control

This paper derives a novel gradient-based controller with dynamic policy regret guar-antees, which is the first controller provably competitive to a sequence of changing policies for online non-stochastic control.

Regret-Optimal Controller for the Full-Information Problem

The regret-optimal control problem can be reduced to a Nehari extension problem, i.e., to approximate an anticausal operator with a causal one in the operator norm, andSimulations over a range of plants demonstrates that the regret- optimal controller interpolates nicely between the H2 and the H∞ optimal controllers, and generally has H1 and H2 costs that are simultaneously close to their optimal values.

Regret-Optimal Full-Information Control

The regretoptimal control problem can be reduced to a Nehari extension problem, i.e., to approximate an anticausal operator with a causal one in the operator norm, and generally has H2 and H∞ costs that are simultaneously close to their optimal values.

Online Optimization with Gradual Variations

It is shown that for the linear and general smooth convex loss functions, an online algorithm modified from the gradient descend algorithm can achieve a regret which only scales as the square root of the deviation, and as an application, this can also have such a logarithmic regret for the portfolio management problem.

Competitive Control

This work designs an online controller which competes against a clairvoyant offline optimal controller and extends competitive control to nonlinear systems using Model Predictive Control (MPC) and presents numerical experiments which show that the competitive controller can significantly outperform standard H2 and H∞ controllers in the MPC setting.