• Corpus ID: 59842954

Bandit Principal Component Analysis

@article{Kotlowski2019BanditPC,
  title={Bandit Principal Component Analysis},
  author={Wojciech Kotlowski and Gergely Neu},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.03035}
}
We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices. We focus on a natural notion of bandit feedback where the learner only observes the loss associated with its own prediction. Based on the classical observation that this decision-making problem can be lifted to the space of density matrices, we… 
Bandit Phase Retrieval
TLDR
The analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling Russo and Roy [2018] are not sufficient for optimal regret.
Improved Regret Bounds of Bilinear Bandits using Action Space Analysis
TLDR
This paper rejects the conjecture above by proposing algorithms that achieve the regret Õ( √ d1d2(d1 + d2)T ) using the fact that the action space dimensionO (d1+d2) is significantly lower than the matrix parameter dimensionO(d 1d2), and devise an algorithm with better empirical performance than previous algorithms.
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
TLDR
This work develops the first algorithm with a ``best-of-both-worlds'' guarantee: it achieves $\mathcal{O}(log T)$ regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with $\tilde{O}}(\sqrt{T})$ regret even when the loses are adversarial, where $T$ is the number of episodes.
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
TLDR
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit Problems and two-layer neural network with polynomial activation bandit problem, providing a minimax-optimal algorithm in the dimension.
Improved Regret Bounds for Bandit Combinatorial Optimization
TLDR
The bound obtained for the bandit ranking in the present study addresses an open problem raised in \citep{cohen2017tight}, and it is demonstrated that the problem becomes easier without considering correlations among entries of loss vectors.
Residual Based Sampling for Online Low Rank Approximation
TLDR
The core of the approach is an adaptive sampling technique that gives a practical and efficient algorithm for both Column Subset Selection and Principal Component Analysis, and proves that by sampling columns using their 'residual norm'' (i.e. their norm orthogonal to directions sampled so far), they end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.
Residual Based Sampling for Online Low Rank Approximation
TLDR
The core of the approach is an adaptive sampling technique that gives a practical and efficient algorithm for both Column Subset Selection and Principal Component Analysis, and it is proved that by sampling columns using their "residual norm" (i.e. their norm orthogonal to directions sampled so far), they end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.
Low-rank Tensor Bandits
TLDR
This work introduces stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low- rank tensor, and proposes two learning algorithms, tensor epoch-greedy and tensor elimination, and develops finite-time regret bounds for them.
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
TLDR
An asymptotically optimal regret bound is proved on the frequentist regret of UTS and simulations showing the significant improvement of the method compared to the state-of-the-art are supported.
On the equivalence of Oja's algorithm and GROUSE
TLDR
It is shown that the Grassmannian Rank-One Subspace Estimation (GROUSE) algorithm is indeed equivalent to Oja’s algorithm in the sense that, at each iteration, given a step size for one of the algorithms, it may construct a step sizes for the other algorithm that results in an identical update.
...
...

References

SHOWING 1-10 OF 76 REFERENCES
Online PCA with Optimal Regret
TLDR
This paper generalizes PCA to arbitrary positive definite instance matrices Xt with the linear loss tr(WtXt) and focuses on two popular online algorithms for generalized PCA: the Gradient Descent and Matrix Exponentiated Gradient algorithms.
Sparsity, variance and curvature in multi-armed bandits
TLDR
A key new insight is used to use regularizers satisfying more refined conditions than general self-concordance to obtain results related to sparsity, variance and curvature.
Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
TLDR
The methodology in the expert setting of online learning is developed by giving an algorithm for learning as well as the best subset of experts of a certain size and then lifted to the matrix setting where the subsets of experts correspond to subspaces.
Stochastic Rank-1 Bandits
TLDR
This work presents the first bandit algorithm that finds the maximum entry of a rank-$1 matrix whose regret is linear in $K + L$, $1 / \Delta$ and $\log n$, and derives a nearly matching lower bound.
Corralling a Band of Bandit Algorithms
TLDR
This work designs a version of Online Mirror Descent with a special mirror map together with a sophisticated learning rate scheme and shows that this approach manages to achieve a more delicate balance between exploiting and exploring base algorithms than previous works yielding superior regret bounds.
Bilinear Bandits with Low-rank Structure
TLDR
It is conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and a new two-stage algorithm called "Explore-Subspace-Then-Refine" (ESTR) is proposed that exploits and further refines the estimated subspace via a regularization technique.
Online local learning via semidefinite programming
TLDR
It is shown that a simple algorithm based on semidefinite programming can achieve asymptotically optimal regret in the case where the number of possible labels is constant, resolving an open problem posed by Hazan, Kale, and Shalev-Schwartz.
On the Complexity of Bandit Linear Optimization
TLDR
It is shown that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret.
The Price of Bandit Information for Online Optimization
TLDR
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.
Beating the adaptive bandit with high probability
We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the
...
...