• Publications
  • Influence
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
TLDR
The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion. Expand
Unsupervised feature selection for multi-cluster data
TLDR
Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, a new approach is proposed, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection, which select those features such that the multi-cluster structure of the data can be best preserved. Expand
Training Deep Nets with Sublinear Memory Cost
TLDR
This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost. Expand
Learning with a Wasserstein Loss
TLDR
An efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which can encourage smoothness of the predictions with respect to a chosen metric on the output space. Expand
Transfusion: Understanding Transfer Learning for Medical Imaging
TLDR
Investigating the learned representations and features finds that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse, and isolate where useful feature reuse occurs, and outline the implications for more efficient model exploration. Expand
A Study on Overfitting in Deep Reinforcement Learning
TLDR
This paper conducts a systematic study of standard RL agents and finds that they could overfit in various ways and calls for more principled and careful evaluation protocols in RL. Expand
Machine Theory of Mind
TLDR
It is argued that this system -- which autonomously learns how to model other agents in its world -- is an important step forward for developing multi-agent AI systems, for building intermediating technology for machine-human interaction, and for advancing the progress on interpretable AI. Expand
Are All Layers Created Equal?
TLDR
This study provides further evidence that mere parameter counting or norm accounting is too coarse in studying generalization of deep models, and flatness or robustness analysis of the models needs to respect the network architectures. Expand
What is being transferred in transfer learning?
TLDR
Through a series of analyses on transferring to block-shuffled images, the effect of feature reuse from learning low-level statistics of data is separated and it is shown that some benefit of transfer learning comes from the latter. Expand
...
1
2
3
4
5
...