• Publications
  • Influence
XGBoost: A Scalable Tree Boosting System
TLDR
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
TLDR
This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.
Max-Margin Markov Networks
TLDR
Maximum margin Markov (M3) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data, and a new theoretical bound for generalization in structured domains is provided.
Cost-effective outbreak detection in networks
TLDR
This work exploits submodularity to develop an efficient algorithm that scales to large problems, achieving near optimal placements, while being 700 times faster than a simple greedy algorithm and achieving speedups and savings in storage of several orders of magnitude.
Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud
TLDR
This paper develops graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency, and introduces fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm.
GraphChi: Large-Scale Graph Computation on Just a PC
TLDR
This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TLDR
TVM is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.
Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies
TLDR
It is proved that the problem of finding the configuration that maximizes mutual information is NP-complete, and a polynomial-time approximation is described that is within (1-1/e) of the optimum by exploiting the submodularity of mutual information.
Stochastic Gradient Hamiltonian Monte Carlo
TLDR
A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.
...
...