• Publications
  • Influence
Densely Connected Convolutional Networks
TLDR
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. Expand
Distance Metric Learning for Large Margin Nearest Neighbor Classification
TLDR
This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification. Expand
From Word Embeddings To Document Distances
TLDR
It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates. Expand
On Calibration of Modern Neural Networks
TLDR
It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions. Expand
Simplifying Graph Convolutional Networks
TLDR
This paper successively removes nonlinearities and collapsing weight matrices between consecutive layers, and theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Expand
BERTScore: Evaluating Text Generation with BERT
TLDR
This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics. Expand
Deep Networks with Stochastic Depth
TLDR
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation. Expand
Marginalized Denoising Autoencoders for Domain Adaptation
TLDR
The approach of mSDA marginalizes noise and thus does not require stochastic gradient descent or other optimization algorithms to learn parameters--in fact, they are computed in closed-form, significantly speeds up SDAs by two orders of magnitude. Expand
Feature hashing for large scale multitask learning
TLDR
This paper provides exponential tail bounds for feature hashing and shows that the interaction between random subspaces is negligible with high probability, and demonstrates the feasibility of this approach with experimental results for a new use case --- multitask learning. Expand
Snapshot Ensembles: Train 1, get M for free
TLDR
This paper proposes a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. Expand
...
1
2
3
4
5
...