Share This Author
signSGD: compressed optimisation for non-convex problems
- Jeremy Bernstein, Yu-Xiang Wang, K. Azizzadenesheli, Anima Anandkumar
- Computer ScienceICML
- 13 February 2018
SignSGD can get the best of both worlds: compressed gradients and SGD-level convergence rate, and the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models.
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
First, convolutional self-attention is proposed by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism, and LogSparse Transformer is proposed, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget.
Detecting and Correcting for Label Shift with Black Box Predictors
Black Box Shift Estimation (BBSE) is proposed to estimate the test distribution of p(y) and it is proved BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible.
Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo
It is shown that under standard assumptions, getting one sample from a posterior distribution is differentially private "for free"; and this sample as a statistical estimator is often consistent, near optimal, and computationally tractable; and this observations lead to an "anytime" algorithm for Bayesian learning under privacy constraint.
Subsampled Rényi Differential Privacy and Analytical Moments Accountant
A tight upper bound is provided on the Renyi Differential Privacy (RDP) parameters for algorithms that subsample the dataset, and then apply a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter.
Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising
An optimal Gaussian mechanism is developed whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation and equipped with a post-processing step based on adaptive estimation techniques by leveraging that the distribution of the perturbation is known.
Trend Filtering on Graphs
- Yu-Xiang Wang, J. Sharpnack, Alex Smola, R. Tibshirani
- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 28 October 2014
A family of adaptive estimators on graphs, based on penalizing the $\ell_1$ norm of discrete graph differences, are introduced, which generalizes the idea of trend filtering, used for univariate nonparametric regression, to graphs.
Provable Subspace Clustering: When LRR Meets SSC
- Yu-Xiang Wang, Huan Xu, Chenlei Leng
- Computer ScienceIEEE Transactions on Information Theory
- 5 December 2013
A new algorithm is proposed, termed Low-rank sparse subspace clustering (LRSSC), by the combining SSC and LRR, and theoretical guarantees of the success of the algorithm are developed, revealing interesting insights into the strengths and the weaknesses of the methods.
Block-Sparse RPCA for Salient Motion Detection
- Zhi Gao, L. Cheong, Yu-Xiang Wang
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 1 April 2014
This work addresses all challenges of representative background subtraction techniques in a unified framework which makes little specific assumption of the background, and is able to obtain crisply defined foreground regions, and handles large dynamic background motion much better.
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
The SWITCH estimator is proposed, which can use an existing reward model to achieve a better bias-variance tradeoff than IPS and DR and prove an upper bound on its MSE and demonstrate its benefits empirically on a diverse collection of data sets, often outperforming prior work by orders of magnitude.