XGBoost: A Scalable Tree Boosting System
- Tianqi Chen, Carlos Guestrin
- Computer ScienceKnowledge Discovery and Data Mining
- 9 March 2016
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
- Tianqi Chen, Mu Li, Zheng Zhang
- Computer ScienceArXiv
- 3 December 2015
The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.
Empirical Evaluation of Rectified Activations in Convolutional Network
- Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li
- Computer ScienceArXiv
- 5 May 2015
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU.
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
- Tianqi Chen, T. Moreau, A. Krishnamurthy
- Computer ScienceUSENIX Symposium on Operating Systems Design and…
- 12 February 2018
TVM is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.
Stochastic Gradient Hamiltonian Monte Carlo
- Tianqi Chen, E. Fox, Carlos Guestrin
- Computer ScienceInternational Conference on Machine Learning
- 17 February 2014
A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.
Net2Net: Accelerating Learning via Knowledge Transfer
- Tianqi Chen, Ian J. Goodfellow, Jonathon Shlens
- Computer ScienceInternational Conference on Learning…
- 18 November 2015
The Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network, and demonstrates a new state of the art accuracy rating on the ImageNet dataset.
A Complete Recipe for Stochastic Gradient MCMC
- Yian Ma, Yi-An Ma, Tianqi Chen, E. Fox
- Computer Science, MathematicsNIPS
- 15 June 2015
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).
Learning to Optimize Tensor Programs
- Tianqi Chen, Lianmin Zheng, A. Krishnamurthy
- Computer ScienceNeural Information Processing Systems
- 21 May 2018
A learning-based framework to optimize tensor programs for deep learning workloads that learns domain-specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants and accelerates the search by effective model transfer across workloads.
Training Deep Nets with Sublinear Memory Cost
- Tianqi Chen, Bing Xu, Chiyuan Zhang, Carlos Guestrin
- Computer ScienceArXiv
- 21 April 2016
This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.
TVM: End-to-End Optimization Stack for Deep Learning
- Tianqi Chen, T. Moreau, A. Krishnamurthy
- Computer ScienceArXiv
- 12 February 2018
TVM is proposed, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and discusses the optimization challenges specific toDeep learning that TVM solves.
...
...