• Corpus ID: 59842954

# Bandit Principal Component Analysis

@article{Kotlowski2019BanditPC,
title={Bandit Principal Component Analysis},
author={Wojciech Kotlowski and Gergely Neu},
journal={ArXiv},
year={2019},
volume={abs/1902.03035}
}
• Published 8 February 2019
• Computer Science
• ArXiv
We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices. We focus on a natural notion of bandit feedback where the learner only observes the loss associated with its own prediction. Based on the classical observation that this decision-making problem can be lifted to the space of density matrices, we…
Bandit Phase Retrieval
• Computer Science
NeurIPS
• 2021
The analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling Russo and Roy [2018] are not sufficient for optimal regret.
Improved Regret Bounds of Bilinear Bandits using Action Space Analysis
• Computer Science
ICML
• 2021
This paper rejects the conjecture above by proposing algorithms that achieve the regret Õ( √ d1d2(d1 + d2)T ) using the fact that the action space dimensionO (d1+d2) is significantly lower than the matrix parameter dimensionO(d 1d2), and devise an algorithm with better empirical performance than previous algorithms.
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
• Computer Science
NeurIPS
• 2020
This work develops the first algorithm with a best-of-both-worlds'' guarantee: it achieves $\mathcal{O}(log T)$ regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with $\tilde{O}}(\sqrt{T})$ regret even when the loses are adversarial, where $T$ is the number of episodes.
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
• Computer Science
NeurIPS
• 2021
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit Problems and two-layer neural network with polynomial activation bandit problem, providing a minimax-optimal algorithm in the dimension.
Improved Regret Bounds for Bandit Combinatorial Optimization
• Computer Science
NeurIPS
• 2019
The bound obtained for the bandit ranking in the present study addresses an open problem raised in \citep{cohen2017tight}, and it is demonstrated that the problem becomes easier without considering correlations among entries of loss vectors.
Residual Based Sampling for Online Low Rank Approximation
• Computer Science
2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
• 2019
The core of the approach is an adaptive sampling technique that gives a practical and efficient algorithm for both Column Subset Selection and Principal Component Analysis, and proves that by sampling columns using their 'residual norm'' (i.e. their norm orthogonal to directions sampled so far), they end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.
Residual Based Sampling for Online Low Rank Approximation
• Computer Science
2020 Information Theory and Applications Workshop (ITA)
• 2020
The core of the approach is an adaptive sampling technique that gives a practical and efficient algorithm for both Column Subset Selection and Principal Component Analysis, and it is proved that by sampling columns using their "residual norm" (i.e. their norm orthogonal to directions sampled so far), they end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.
Low-rank Tensor Bandits
• Computer Science
ArXiv
• 2020
This work introduces stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low- rank tensor, and proposes two learning algorithms, tensor epoch-greedy and tensor elimination, and develops finite-time regret bounds for them.
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
• Computer Science
ALT
• 2020
An asymptotically optimal regret bound is proved on the frequentist regret of UTS and simulations showing the significant improvement of the method compared to the state-of-the-art are supported.
On the equivalence of Oja's algorithm and GROUSE
It is shown that the Grassmannian Rank-One Subspace Estimation (GROUSE) algorithm is indeed equivalent to Oja’s algorithm in the sense that, at each iteration, given a step size for one of the algorithms, it may construct a step sizes for the other algorithm that results in an identical update.

## References

SHOWING 1-10 OF 76 REFERENCES
Online PCA with Optimal Regret
• Computer Science
J. Mach. Learn. Res.
• 2016
This paper generalizes PCA to arbitrary positive definite instance matrices Xt with the linear loss tr(WtXt) and focuses on two popular online algorithms for generalized PCA: the Gradient Descent and Matrix Exponentiated Gradient algorithms.
Sparsity, variance and curvature in multi-armed bandits
• Computer Science
ALT
• 2018
A key new insight is used to use regularizers satisfying more refined conditions than general self-concordance to obtain results related to sparsity, variance and curvature.
Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
• Computer Science
• 2008
The methodology in the expert setting of online learning is developed by giving an algorithm for learning as well as the best subset of experts of a certain size and then lifted to the matrix setting where the subsets of experts correspond to subspaces.
Stochastic Rank-1 Bandits
• Computer Science
AISTATS
• 2017
This work presents the first bandit algorithm that finds the maximum entry of a rank-$1 matrix whose regret is linear in$K + L$,$1 / \Delta$and$\log n$, and derives a nearly matching lower bound. Corralling a Band of Bandit Algorithms • Computer Science COLT • 2017 This work designs a version of Online Mirror Descent with a special mirror map together with a sophisticated learning rate scheme and shows that this approach manages to achieve a more delicate balance between exploiting and exploring base algorithms than previous works yielding superior regret bounds. Bilinear Bandits with Low-rank Structure • Computer Science ICML • 2019 It is conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and a new two-stage algorithm called "Explore-Subspace-Then-Refine" (ESTR) is proposed that exploits and further refines the estimated subspace via a regularization technique. Online local learning via semidefinite programming It is shown that a simple algorithm based on semidefinite programming can achieve asymptotically optimal regret in the case where the number of possible labels is constant, resolving an open problem posed by Hazan, Kale, and Shalev-Schwartz. On the Complexity of Bandit Linear Optimization It is shown that the price of bandit information in this setting can be as large as$d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most$\sqrt{d}\$ times the full-information regret.
The Price of Bandit Information for Online Optimization
• Computer Science
NIPS
• 2007
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.
Beating the adaptive bandit with high probability
• Computer Science, Mathematics
2009 Information Theory and Applications Workshop
• 2009
We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the