• Publications
  • Influence
PyTorch: An Imperative Style, High-Performance Deep Learning Library
TLDR
This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Automatic differentiation in PyTorch
TLDR
An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
TLDR
A novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation, which is up to 18 times faster, requires 75% less FLOPs, has 79% less parameters, and provides similar or better accuracy to existing models.
An Analysis of Deep Neural Network Models for Practical Applications
TLDR
This work presents a comprehensive analysis of important metrics in practical applications: accuracy, memory footprint, parameters, operations count, inference time and power consumption and believes it provides a compelling set of information that helps design and engineer efficient DNNs.
Evaluation of neural network architectures for embedded systems
TLDR
This work presents a comprehensive analysis of important metrics in practical applications: accuracy, memory footprint, parameters, operations count, inference time and power consumption, and believes it provides a compelling set of information that helps design and engineer efficient DNNs.
PyTorch distributed
TLDR
Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.
Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming
TLDR
A novel programming language design is presented that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages, and an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism.
Decomposing reverse-mode automatic differentiation
We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forwardand reverse-mode AD, and
Tensors Fitting Perfectly
TLDR
Tensors Fitting Perfectly is a static analysis tool that reasons about NDArray shapes in Swift for TensorFlow programs by synthesizing a set of shape constraints from an abstract interpretation of the program.
VC density of set systems defnable in tree-like graphs
TLDR
The notion of Vapnik-Chervonenkis density is focused on: the smallest possible degree of a polynomial bounding the cardinalities of restrictions of set systems definable in graphs using variants of logic with different expressive power.
...
1
2
...