Learn More
This short paper aims at presenting a method for automatically extracting and evaluating MWE in the Europarl corpus. For this purpose we make use of mwetoolkit and utilize its output to find rules for the automatic evaluation of MWE. We then developed an XML parser to evaluate MWE candidates against those rules and also against online dictionaries. A sample(More)
In this paper we present a tool for the automatic extraction of subcategorization frames from Portuguese corpora. Subcategorization frames are important to many Natural Language Processing (NLP) tasks, such as the improvement of parsing results. The tool presented here, which is based on a system developed for French, comes to fill a gap in Portuguese,(More)
We introduce a new multilingual resource containing judgments about nominal compound compositionality in English, French and Por-tuguese. It covers 3 × 180 noun-noun and adjective-noun compounds for which we provide numerical compositionality scores for the head word, for the modifier and for the compound as a whole, along with possible paraphrases. This(More)
Semantic role labeling offers vital information for both Linguistics and Natural Language Processing tasks. In this article, we present a lexical resource for Portuguese annotated with semantic roles: VerbLexPor. The resource is a database with verbs and sentences extracted from both a domain specific corpus and a non-specialized generic one. Annotation was(More)
This paper aims at presenting a methodology for semi-automatic validation of an wide-coverage ontology based on an existing electronic resource, PAPEL. From the existing relations, we choose those of synonymy and hyper-nymy to generate the ontology. The resulting output was converted to OWL format e manually validated by a lexicographer. As result, we have(More)
Automatic lexical alignment is a vital step for empirical machine translation, and although good results can be obtained with existent models (e.g. Giza++), more precise alignment is still needed for successfully handling complex constructions such as multiword expressions. In this paper we propose an approach for lexical alignment combining statistical and(More)
This paper presents a lexical resource developed for Portuguese. The resource contains sentences annotated with semantic roles. The sentences were extracted from two domains: Cardiology research papers and newspaper articles. Both corpora were analyzed with the PALAVRAS parser and subsequently processed with a subcategorization frames extractor, so that(More)