• Publications
  • Influence
Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory
In this paper we take a state-of-the-art model for distributed word representation that explicitly factorizes the positive pointwise mutual information (PPMI) matrix using window sampling and
The brWaC Corpus: A New Open Resource for Brazilian Portuguese
In this work, we present the construction process of a large Web corpus for Brazilian Portuguese, aiming to achieve a size comparable to the state of the art in other languages. We also discuss our
Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time
A large-scale multilingual evaluation of DSMs for predicting the degree of semantic compositionality of nominal compounds on 4 datasets for English and French shows a high correlation with human judgments, being comparable to or outperforming the state of the art for some datasets.
mwetoolkit: a Framework for Multiword Expression Identification
The preliminary results show that the toolkit performs better than other approaches, especially concerning recall, and can be extended in several ways in order to improve the quality of the results.
Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering
The overall conclusion is that at least two measures seem to differentiate MWEs from non-MWEs, and it is argued that such a process improves qualitatively, if a more compositional approach to grammar/lexicon automated extension is adopted.
The acquisition of a unification-based generalised categorial grammar
The purpose of this work is to investigate the process of grammatical acquisition from data. In order to do that, a computational learning system is used, composed of a Universal Grammar with
Extracting the Unextractable: A Case Study on Verb-particles
This paper combines three basic methods for extracting English verb--particle constructions from raw text corpora into a single classifier, and adds in a number of extra lexical and frequentistic features.
Multiword Expressions in the wild? The mwetoolkit comes in handy
The use of the mwetoolkit in a standard configuration, for extracting MWEs from a corpus of general-purpose English, is presented, comparing it with related work on MWE extraction.
Alignment-based extraction of multiword expressions
This paper proposes an approach for the identification of MWEs in a multilingual context, as a by-product of a word alignment process, that not only deals with the Identification of possible MWE candidates, but also associates some multiword expressions with semantics.
Introduction to the special issue on multiword expressions: Having a crack at a hard nut
This special issue includes ten papers which propose a variety of approaches for finding and handling multiword expressions, both for building general purpose lexical resources and in the context of specific applications.