• Corpus ID: 239885763

Online Action Learning in High Dimensions: A Conservative Perspective

  title={Online Action Learning in High Dimensions: A Conservative Perspective},
  author={Claudio C. Flores and Marcelo C. Medeiros},
Sequential learning problems are common in several fields of research and practical applications. Examples include dynamic pricing and assortment, design of auctions and incentives and permeate a large number of sequential treatment experiments. In this paper, we extend one of the most popular learning solutions, the t-greedy heuristics, to high-dimensional contexts considering a conservative directive. We do this by allocating part of the time the original rule uses to adopt completely new… 

Figures and Tables from this paper



Doubly-Robust Lasso Bandit

This work considers the stochastic linear contextual bandit problem and proposes a novel algorithm, namely the Doubly-Robust Lasso Bandit algorithm, which exploits the sparse structure of the regression parameter as in Lasso while blending the doubly-robust technique used in missing data literature.

The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

An algorithm for multi-armed bandits with observable side information with no knowledge of a time horizon and the regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class.

Online Decision-Making with High-Dimensional Covariates

This work forms this problem as a multi-armed bandit with high-dimensional covariates, and presents a new efficient bandit algorithm based on the LASSO estimator that outperforms existing bandit methods as well as physicians to correctly dose a majority of patients.

Finite-time Analysis of the Multiarmed Bandit Problem

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Regret Minimization for Reserve Prices in Second-Price Auctions

A regret minimization algorithm for setting the reserve price in a sequence of second-price auctions, under the assumption that all bids are independently drawn from the same unknown and arbitrary distribution, achieves a regret of Õ(√T) in asequence of T auctions.

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.

A Simple Unified Framework for High Dimensional Bandit Problems

This work proposes a simple unified algorithm forochastic high dimensional bandit problems with low dimensional structures and presents a general analysis framework for the regret upper bound of the algorithm, which achieves the comparable regret bounds in the LASSO bandit as a sanity check.

Survey Bandits with Regret Guarantees

This work proposes algorithms that avoid needless feature collection while maintaining strong regret guarantees in a variant of the contextual bandit problem.

A Linear Response Bandit Problem

We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of