End-to-End Object Detection with Transformers
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko
- Computer ScienceEuropean Conference on Computer Vision
- 26 May 2020
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.
Going deeper with Image Transformers
- Hugo Touvron, M. Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Herv'e J'egou
- Computer ScienceIEEE International Conference on Computer Vision
- 31 March 2021
This work builds and optimize deeper transformer networks for image classification and investigates the interplay of architecture and optimization of such dedicated transformers, making two architecture changes that significantly improve the accuracy of deep transformers.
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
- Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion
- Computer ScienceIEEE International Conference on Computer Vision
- 26 April 2021
This paper proposes MDETR, an end-to-end modulated detector that detects objects in an image conditioned on a raw text query, like a caption or a question, and shows that the pre-training approach provides a way to handle the long tail of object categories which have very few labelled instances.
ResMLP: Feedforward networks for image classification with data-efficient training
- Hugo Touvron, Piotr Bojanowski, Herv'e J'egou
- Computer ScienceIEEE Transactions on Pattern Analysis and Machineā¦
- 7 May 2021
ResMLP is a simple residual network that alternates a linear layer in which image patches interact, independently and identically across channels, and a two-layer feed-forward network in which channels interact independently per patch that attains surprisingly good accuracy/complexity trade-offs on ImageNet.
A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft
- Santiago Ontañón, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, M. Preuss
- Computer ScienceIEEE Transactions on Computational Intelligenceā¦
- 18 October 2013
An overview of the existing work on AI for real-time strategy (RTS) games focuses on the work around the game StarCraft, which has emerged in the past few years as the unified test bed for this research.
Libri-Light: A Benchmark for ASR with Limited or No Supervision
- Jacob Kahn, M. RiviĆØre, Emmanuel Dupoux
- Computer ScienceIEEE International Conference on Acousticsā¦
- 17 December 2019
A new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision, derived from open-source audio books from the LibriVox project, which is, to the authors' knowledge, the largest freely-available corpus of speech.
MLS: A Large-Scale Multilingual Dataset for Speech Research
- Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
- Computer Science, LinguisticsInterspeech
- 25 October 2020
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research and believes such a large transcribed dataset will open new avenues in ASR and Text-To-Speech research.
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
- Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve
- Physics, Computer ScienceArXiv
- 11 September 2016
A simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding, trained to output letters, without the need for force alignment of phonemes is presented.
Real Time Speech Enhancement in the Waveform Domain
- Alexandre DƩfossez, Gabriel Synnaeve, Yossi Adi
- Computer ScienceInterspeech
- 23 June 2020
Empirical evidence shows that the proposed causal speech enhancement model, based on an encoder-decoder architecture with skip-connections, is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb.
XCiT: Cross-Covariance Image Transformers
- Alaaeldin El-Nouby, Hugo Touvron, H. JƩgou
- Computer ScienceNeural Information Processing Systems
- 17 June 2021
This work proposes a ātransposedā version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries, which has linear complexity in the number of tokens, and allows high-resolution images processing.
...
...