Share This Author
PyTorch: An Imperative Style, High-Performance Deep Learning Library
This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Automatic differentiation in PyTorch
An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Deep Counterfactual Regret Minimization
Deep Counterfactual Regret Minimization is introduced, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.
Learning Physical Intuition of Block Towers by Example
This paper creates small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright) to train large convolutional network models which can accurately predict the outcome, as well as estimating the block trajectories.
A MultiPath Network for Object Detection
Three modifications to the standard Fast R-CNN object detector are tested, including a skip connections that give the detector access to features at multiple network layers, a foveal structure to exploit object context at multiple object resolutions, and an integral loss function and corresponding network adjustment that improve localization.
Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks
A simple semi-supervised learning approach for images based on in-painting using an adversarial loss is introduced, able to directly train large VGG-style networks in a semi- supervised fashion.
Hard Mixtures of Experts for Large Scale Weakly Supervised Vision
- S. Gross, Marc'Aurelio Ranzato, Arthur D. Szlam
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 20 April 2017
This work shows that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks, and demonstrates that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets.
Real or Fake? Learning to Discriminate Machine from Human Generated Text
- A. Bakhtin, S. Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur D. Szlam
- Computer ScienceArXiv
- 7 June 2019
Overall, it is observed that EBMs can generalize remarkably well to changes in the architecture of the generators producing negatives, however, EBMs exhibit more sensitivity to the training set used by such generators.
Residual Energy-Based Models for Text
- A. Bakhtin, Yuntian Deng, S. Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur D. Szlam
- Computer ScienceJ. Mach. Learn. Res.
This work finds experimentally that the answer is affirmative when one has access to the training data for the model, and guardedly affirmative even if one does not, suggesting that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process.