# Bandit Principal Component Analysis

@article{Kotlowski2019BanditPC, title={Bandit Principal Component Analysis}, author={Wojciech Kotlowski and Gergely Neu}, journal={ArXiv}, year={2019}, volume={abs/1902.03035} }

We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices. We focus on a natural notion of bandit feedback where the learner only observes the loss associated with its own prediction. Based on the classical observation that this decision-making problem can be lifted to the space of density matrices, we…

## 13 Citations

Bandit Phase Retrieval

- Computer ScienceNeurIPS
- 2021

The analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling Russo and Roy [2018] are not sufficient for optimal regret.

Improved Regret Bounds of Bilinear Bandits using Action Space Analysis

- Computer ScienceICML
- 2021

This paper rejects the conjecture above by proposing algorithms that achieve the regret Õ( √ d1d2(d1 + d2)T ) using the fact that the action space dimensionO (d1+d2) is significantly lower than the matrix parameter dimensionO(d 1d2), and devise an algorithm with better empirical performance than previous algorithms.

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

- Computer ScienceNeurIPS
- 2020

This work develops the first algorithm with a ``best-of-both-worlds'' guarantee: it achieves $\mathcal{O}(log T)$ regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with $\tilde{O}}(\sqrt{T})$ regret even when the loses are adversarial, where $T$ is the number of episodes.

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

- Computer ScienceNeurIPS
- 2021

This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit Problems and two-layer neural network with polynomial activation bandit problem, providing a minimax-optimal algorithm in the dimension.

Improved Regret Bounds for Bandit Combinatorial Optimization

- Computer ScienceNeurIPS
- 2019

The bound obtained for the bandit ranking in the present study addresses an open problem raised in \citep{cohen2017tight}, and it is demonstrated that the problem becomes easier without considering correlations among entries of loss vectors.

Residual Based Sampling for Online Low Rank Approximation

- Computer Science2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
- 2019

The core of the approach is an adaptive sampling technique that gives a practical and efficient algorithm for both Column Subset Selection and Principal Component Analysis, and proves that by sampling columns using their 'residual norm'' (i.e. their norm orthogonal to directions sampled so far), they end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.

Residual Based Sampling for Online Low Rank Approximation

- Computer Science2020 Information Theory and Applications Workshop (ITA)
- 2020

The core of the approach is an adaptive sampling technique that gives a practical and efficient algorithm for both Column Subset Selection and Principal Component Analysis, and it is proved that by sampling columns using their "residual norm" (i.e. their norm orthogonal to directions sampled so far), they end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.

Low-rank Tensor Bandits

- Computer ScienceArXiv
- 2020

This work introduces stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low- rank tensor, and proposes two learning algorithms, tensor epoch-greedy and tensor elimination, and develops finite-time regret bounds for them.

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

- Computer ScienceALT
- 2020

An asymptotically optimal regret bound is proved on the frequentist regret of UTS and simulations showing the significant improvement of the method compared to the state-of-the-art are supported.

On the equivalence of Oja's algorithm and GROUSE

- Computer ScienceAISTATS
- 2022

It is shown that the Grassmannian Rank-One Subspace Estimation (GROUSE) algorithm is indeed equivalent to Oja’s algorithm in the sense that, at each iteration, given a step size for one of the algorithms, it may construct a step sizes for the other algorithm that results in an identical update.

## References

SHOWING 1-10 OF 76 REFERENCES

Online PCA with Optimal Regret

- Computer ScienceJ. Mach. Learn. Res.
- 2016

This paper generalizes PCA to arbitrary positive definite instance matrices Xt with the linear loss tr(WtXt) and focuses on two popular online algorithms for generalized PCA: the Gradient Descent and Matrix Exponentiated Gradient algorithms.

Sparsity, variance and curvature in multi-armed bandits

- Computer ScienceALT
- 2018

A key new insight is used to use regularizers satisfying more refined conditions than general self-concordance to obtain results related to sparsity, variance and curvature.

Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

- Computer Science
- 2008

The methodology in the expert setting of online learning is developed by giving an algorithm for learning as well as the best subset of experts of a certain size and then lifted to the matrix setting where the subsets of experts correspond to subspaces.

Stochastic Rank-1 Bandits

- Computer ScienceAISTATS
- 2017

This work presents the first bandit algorithm that finds the maximum entry of a rank-$1 matrix whose regret is linear in $K + L$, $1 / \Delta$ and $\log n$, and derives a nearly matching lower bound.

Corralling a Band of Bandit Algorithms

- Computer ScienceCOLT
- 2017

This work designs a version of Online Mirror Descent with a special mirror map together with a sophisticated learning rate scheme and shows that this approach manages to achieve a more delicate balance between exploiting and exploring base algorithms than previous works yielding superior regret bounds.

Bilinear Bandits with Low-rank Structure

- Computer ScienceICML
- 2019

It is conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and a new two-stage algorithm called "Explore-Subspace-Then-Refine" (ESTR) is proposed that exploits and further refines the estimated subspace via a regularization technique.

Online local learning via semidefinite programming

- Computer ScienceSTOC
- 2014

It is shown that a simple algorithm based on semidefinite programming can achieve asymptotically optimal regret in the case where the number of possible labels is constant, resolving an open problem posed by Hazan, Kale, and Shalev-Schwartz.

On the Complexity of Bandit Linear Optimization

- Computer ScienceCOLT
- 2015

It is shown that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret.

The Price of Bandit Information for Online Optimization

- Computer ScienceNIPS
- 2007

This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.

Beating the adaptive bandit with high probability

- Computer Science, Mathematics2009 Information Theory and Applications Workshop
- 2009

We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the…