Share This Author
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
- Computer ScienceInternational Conference on Learning…
- 26 September 2019
This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
- Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
- Computer Science, Environmental ScienceAnnual Meeting of the Association for…
- 1 July 2018
We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety…
Findings of the 2014 Workshop on Statistical Machine Translation
This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation…
Sentence Level Discourse Parsing using Syntactic and Lexical Information
- Radu Soricut, D. Marcu
- Sociology, Computer ScienceNorth American Chapter of the Association for…
- 27 May 2003
Two probabilistic models that can be used to identify elementary discourse units and build sentence-level discourse parse trees are introduced and shown to be sophisticated enough to yield discourse trees at an accuracy level that matches near-human levels of performance.
Findings of the 2012 Workshop on Statistical Machine Translation
- Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, Lucia Specia
- Computer Science, PsychologyWMT@NAACL-HLT
- 7 June 2012
A large-scale manual evaluation of 103 machine translation systems submitted by 34 teams was conducted, which used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 12 evaluation metrics.
Findings of the 2013 Workshop on Statistical Machine Translation
We present the results of the WMT13 shared tasks, which included a translation task, a task for run-time estimation of machine translation quality, and an unofficial metrics task. This year, 143…
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
- Soravit Changpinyo, P. Sharma, Nan Ding, Radu Soricut
- Computer ScienceComputer Vision and Pattern Recognition
- 17 February 2021
The results clearly illustrate the benefit of scaling up pre-training data for vision-and-language tasks, as indicated by the new state-of-the-art results on both the nocaps and Conceptual Captions benchmarks.
Connecting Vision and Language with Localized Narratives
- J. Pont-Tuset, J. Uijlings, Soravit Changpinyo, Radu Soricut, V. Ferrari
- Computer ScienceEuropean Conference on Computer Vision
- 6 December 2019
An extensive analysis of Localized Narratives is provided showing they are diverse, accurate, and efficient to produce and their utility on the application of controlled image captioning is demonstrated.
Automatic Question Answering: Beyond the Factoid
This paper describes and evaluates a Question Answering system that goes beyond answering factoid questions, and builds the system around a noisy-channel architecture which exploits both a language model for answers and a transformation model for answer/question terms.
The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task
These MT quality-prediction systems use machine learning techniques (M5P regression-tree and SVM-regression models) and a feature-selection algorithm that has been designed to directly optimize towards the official metrics used in this shared-task.