• Corpus ID: 7349422

Online Sparse Linear Regression

@inproceedings{Foster2016OnlineSL,
  title={Online Sparse Linear Regression},
  author={Dean P. Foster and Satyen Kale and Howard J. Karloff},
  booktitle={COLT},
  year={2016}
}
We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss. We give an inefficient algorithm that obtains regret bounded by $\tilde{O}(\sqrt{T})$ after $T$ prediction rounds. We complement this result by showing that no algorithm running in polynomial time per… 
Efficient Sublinear-Regret Algorithms for Online Sparse Linear Regression with Limited Observation
TLDR
Under mild assumptions, polynomial-time sublinear-regret algorithms for the online sparse linear regression are presented and thorough experiments demonstrate that these algorithms outperform other known algorithms.
Sparse Regression via Range Counting
TLDR
This work describes a $O(n^{k-1} \log^{d-k+2} n)-time randomized $(1+\varepsilon)$-approximation algorithm for the sparse regression problem, and provides a simple $O_\delta(n-1+ \delta})-time deterministic exact algorithm, for any \(\delta > 0\).
Efficient online algorithms for fast-rate regret bounds under sparsity
TLDR
New risk bounds are established that are adaptive to the sparsity of the problem and to the regularity of the risk (ranging from a rate 1 / $\sqrt T$ for general convex risk to 1 /T for strongly convexrisk) and generalize previous works on sparse online learning.
Online Regression with Partial Information: Generalization and Linear Projection
TLDR
This paper proposes a general setting for the limitation of the available information, where the observed information is determined by a function chosen from a given set of observation functions, and proposes efficient algorithms for this special case.
Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP
TLDR
This paper makes the assumption that data matrix satisfies restricted isometry property, and shows that this assumption leads to computationally efficient algorithms with sublinear regret for two variants of the problem.
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation
We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of
On Distributed Exact Sparse Linear Regression over Networks
TLDR
This work shows theo- retically and empirically that, under appropriate assumptions, where each agent solves smaller and local integer programming problems, all agents will eventually reach a consensus on the same sparse optimal regressor.
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
TLDR
This work considers the stochastic contextual bandit problem under the high dimensional linear model and proposes doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice and achieves high probability regret with high probability.
Online Linear Optimization with Sparsity Constraints
TLDR
This work provides two algorithms which are efficient and achieve sublinear regret bounds in the problem of online linear optimization with sparsity constraints in the semi-bandit setting and extends their results to two generalized settings.
Linear Optimization with Sparsity Constraints
TLDR
This work provides an algorithm which is efficient and achieves a sublinear regret bound for the problem of online linear optimization with sparsity constraints in the semi-bandit setting and extends its results to two generalized settings.
...
...

References

SHOWING 1-10 OF 15 REFERENCES
Logarithmic regret algorithms for online convex optimization
TLDR
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Variable Selection is Hard
TLDR
These are the first hardness results for sparse regression that apply when the algorithm simultaneously hask 0 > k andh(m;p) > 0, and a similar result for a statistical version of the problem in which the data are corrupted by noise.
Sparse Approximate Solutions to Linear Systems
The following problem is considered: given a matrix $A$ in ${\bf R}^{m \times n}$, ($m$ rows and $n$ columns), a vector $b$ in ${\bf R}^m$, and ${\bf \epsilon} > 0$, compute a vector $x$ satisfying
Linear Regression with Limited Observation
We consider the most common variants of linear regression, including Ridge, Lasso and Support-vector regression, in a setting where the learner is allowed to observe only a fixed number of attributes
Open Problem: Efficient Online Sparse Regression
TLDR
This work provides one natural formulation as an online sparse regression problem with squared loss, and asks whether it is possible to achieve sublinear regret with ecient algorithms (i.e. polynomial running time in the natural parameters of the problem).
Attribute Efficient Linear Regression with Distribution-Dependent Sampling
TLDR
This work develops efficient algorithms for Ridge and Lasso linear regression, which utilize the geometry of the data by a novel distribution-dependent sampling scheme, and have excess risk bounds which are better a factor of up to O(√d/k) over the state of the art.
Online Learning with Costly Features and Labels
TLDR
This paper provides algorithms and upper and lower bounds on the regret for both variants of the online probing problem and shows that a positive cost for observing the label significantly increases the regret of the problem.
Online convex optimization in the bandit setting: gradient descent without a gradient
TLDR
It is possible to use gradient descent without seeing anything more than the value of the functions at a single point, and the guarantees hold even in the most general case: online against an adaptive adversary.
On restricted-focus-of-attention learnability of Boolean functions
TLDR
An information-theoretic characterization of k-RFA learnability is developed upon which a general tool for proving hardness results are built, and it is shown that, unlike the PAC model, weak learning does not imply strong learning in the k -RFA model.
Analytical approach to parallel repetition
TLDR
Improved bounds for few parallel repetitions of projection games are shown, showing that Raz's counterexample to strong parallel repetition is tight even for a small number of repetitions, and a short proof for the NP-hardness of label cover(1, δ) for all δ > 0, starting from the basic PCP theorem.
...
...