Share This Author
Large-scale Multi-label Learning with Missing Labels
This paper studies the multi-label problem in a generic empirical risk minimization (ERM) framework and develops techniques that exploit the structure of specific loss functions - such as the squared loss function - to obtain efficient algorithms.
Collaborative Filtering with Graph Information: Consistency and Scalable Methods
This work formulate and derive a highly efficient, conjugate gradient based alternating minimization scheme that solves optimizations with over 55 million observations up to 2 orders of magnitude faster than state-of-the-art (stochastic) gradient-descent based methods.
Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction
This paper develops novel regularization schemes and uses scalable matrix factorization methods that are eminently suited for high-dimensional time series data that has many missing values, and makes interesting connections to graph regularization methods in the context of learning the dependencies in an autoregressive framework.
Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems
- Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, I. Dhillon
- Computer ScienceIEEE 12th International Conference on Data Mining
- 10 December 2012
It is shown that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD, and it is empirically shown that CCD++ is much faster than ALS and SGD in both settings.
PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent
This paper proposes a family of parallel asynchronous stochastic dual coordinate descent algorithms (PASSCoDe), showing that the converged solution is the exact solution for a primal problem with a perturbed regularizer under the multi-core environment.
NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion
- Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. Vishwanathan, I. Dhillon
- Computer ScienceProc. VLDB Endow.
- 1 December 2013
An efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion), which outperforms synchronous algorithms which require explicit bulk synchronization after every iteration.
Large Linear Classification When Data Cannot Fit in Memory
This work proposes and analyzes a block minimization framework for data larger than the memory size, and investigates two implementations of the proposed framework for primal and dual SVMs, respectively.
Dual coordinate descent methods for logistic regression and maximum entropy models
This paper applies coordinate descent methods to solve the dual form of logistic regression and maximum entropy, and shows that many details are different from the situation in linear SVM.
Taming Pretrained Transformers for Extreme Multi-label Text Classification
X-Transformer is proposed, the first scalable approach to fine-tuning deep transformer models for the XMC problem and achieves new state-of-the-art results on four XMC benchmark datasets.
Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting
DeepGLO is a hybrid model that combines a global matrix factorization model regularized by a temporal convolution network, along with another temporal network that can capture local properties of each time-series and associated covariates and can outperform state-of-the-art approaches.