• Publications
  • Influence
XGBoost: A Scalable Tree Boosting System
TLDR
We propose a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. Expand
  • 7,062
  • 1201
  • PDF
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
We propose LIME, a novel explanation technique that explains the predictions of any classifier in a interpretable and faithful manner, by learning an interpretable model locally varound the prediction. Expand
  • 4,013
  • 654
  • PDF
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
TLDR
We introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address the challenges of computation on natural graphs in the context of existing graph-parallel abstractions. Expand
  • 1,456
  • 342
  • PDF
Max-Margin Markov Networks
TLDR
Maximum margin Markov (M3) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. Expand
  • 1,458
  • 212
  • PDF
Cost-effective outbreak detection in networks
TLDR
We exploit submodularity to develop an efficient algorithm that scales to large problems, achieving near optimal placements, while being 700 times faster than a simple greedy algorithm. Expand
  • 1,961
  • 210
  • PDF
Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many importantExpand
  • 1,248
  • 196
  • PDF
GraphChi: Large-Scale Graph Computation on Just a PC
TLDR
We present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. Expand
  • 906
  • 178
  • PDF
Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies
TLDR
We solve the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected by exploiting the submodularity of mutual information. Expand
  • 1,223
  • 136
  • PDF
Stochastic Gradient Hamiltonian Monte Carlo
TLDR
Hamiltonian Monte Carlo sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. Expand
  • 480
  • 106
  • PDF
GraphLab: A New Framework For Parallel Machine Learning
TLDR
We developed GraphLab, a new parallel abstraction which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies. Expand
  • 781
  • 89
  • PDF