• Publications
  • Influence
End-to-End Object Detection with Transformers
TLDR
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.
A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft
TLDR
An overview of the existing work on AI for real-time strategy (RTS) games focuses on the work around the game StarCraft, which has emerged in the past few years as the unified test bed for this research.
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
TLDR
A simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding, trained to output letters, without the need for force alignment of phonemes is presented.
Going deeper with Image Transformers
TLDR
This work builds and optimize deeper transformer networks for image classification and investigates the interplay of architecture and optimization of such dedicated transformers, making two architecture changes that significantly improve the accuracy of deep transformers.
Libri-Light: A Benchmark for ASR with Limited or No Supervision
TLDR
A new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision, derived from open-source audio books from the LibriVox project, which is, to the authors' knowledge, the largest freely-available corpus of speech.
ResMLP: Feedforward networks for image classification with data-efficient training
TLDR
ResMLP is a simple residual network that alternates a linear layer in which image patches interact, independently and identically across channels, and a two-layer feed-forward network in which channels interact independently per patch.
MLS: A Large-Scale Multilingual Dataset for Speech Research
TLDR
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research and believes such a large transcribed dataset will open new avenues in ASR and Text-To-Speech research.
Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks
TLDR
A heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation and allows for the collection of traces for learning using deterministic policies, which appears much more efficient than, for example, {\epsilon}-greedy exploration.
Learning Filterbanks from Raw Speech for Phone Recognition
TLDR
A bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition and shows that for several architectures, models trained on TD- filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks.
TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games
TLDR
This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft, a library that enables deep learning research on Real-Time Strategy games such as StarCraft: Brood War.
...
1
2
3
4
5
...