• Publications
  • Influence
From Word Embeddings To Document Distances
We present the Word Mover's Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representationsExpand
  • 969
  • 248
Grammar Variational Autoencoder
Deep generative models have been wildly successful at learning coherent latent representations for continuous data such as video and audio. However, generative modeling of discrete data such asExpand
  • 235
  • 51
Counterfactual Fairness
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of theseExpand
  • 347
  • 46
Bayesian Optimization with Inequality Constraints
Bayesian optimization is a powerful framework for minimizing expensive objective functions while using very few function evaluations. It has been successfully applied to a variety of problems,Expand
  • 158
  • 13
Supervised Word Mover's Distance
Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddingsExpand
  • 83
  • 11
GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution
Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such asExpand
  • 160
  • 10
Cost-Sensitive Tree of Classifiers
Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (e.g. search engines and email spam filters). Here, the CPU cost during test-time mustExpand
  • 88
  • 9
Stochastic Neighbor Compression
We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smallerExpand
  • 52
  • 9
Classifier cascades and trees for minimizing feature evaluation cost
Machine learning algorithms have successfully entered industry through many real-world applications (e.g., search engines and product recommendations). In these applications, the test-time CPU costExpand
  • 71
  • 7
Feature-Cost Sensitive Learning with Submodular Trees of Classifiers
During the past decade, machine learning algorithms have become commonplace in large-scale real-world industrial applications. In these settings, the computation time to train and test machineExpand
  • 41
  • 6