• Publications
  • Influence
Learning Morphology with Morfette
Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora with high accuracy with no language-specific feature engineering or additional resources. Expand
Representations of language in a model of visually grounded speech signal
An in-depth analysis of the representations used by different components of the trained model shows that encoding of semantic aspects tends to become richer as the authors go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease. Expand
Representation of Linguistic Form and Function in Recurrent Neural Networks
A method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks is proposed and shows that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. Expand
Efficient induction of probabilistic word classes with LDA
It is shown that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks. Expand
Encoding of phonology in a recurrent neural model of grounded speech
It is found that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. Expand
Hierarchical Recognition of Propositional Arguments with Perceptrons
A two-layer learning architecture to recognize arguments in a sentence and predict the role they play in the propositions and the learning algorithm follows the global strategy introduced in (Collins, 2002) and adapted in (Carreras and Màrquez, 2004b) for partial parsing tasks. Expand
Text segmentation with character-level text embeddings
This work proposes to learn text representations directly from raw character sequences by training a Simple Recurrent Network to predict the next character in text and uses the learned text embeddings as features in a supervised character level text segmentation and labeling task. Expand
Multimodal Semantic Learning from Child-Directed Input
This work presents a distributed word learning model that operates on child-directed speech paired with realistic visual scenes that integrates linguistic and extra-linguistic information, handles referential uncertainty, and correctly learns to associate words with objects, even in cases of limited linguistic exposure. Expand
Acquiring Verb Subcategorization from Spanish Corpora
A Subcategorization classes in the SENSEM database 75 B Canonical templates for SFs 77 C Metarules 80 2
Question Quality in Community Question Answering Forums: a survey
This survey reviews existing research on question quality in CQA websites and discusses the possible measures of question quality and the question features that have been shown to influence question quality. Expand