Aitziber Atutxa

Learn More
This paper presents three successful techniques to translate prepositions heading verbal complements by means of rich linguistic information, in the context of a rule-based Machine Translation system for an agglutinative language with scarce resources. This information comes in the form of lexicalized syntactic dependency triples, verb subcategorization and(More)
This paper presents the design and implementation of a finite-state syntactic grammar of Basque that has been used with the objective of extracting information about verb subcategorization instances from newspaper texts. After a partial parser has built basic syntactic units such as noun phrases, prepositional phrases, and sentential complements, a(More)
This paper explores a crosslingual approach to the PP attachment problem. We built a large dependency database for English based on an automatic parse of the BNC, and Reuters (sports and finances sections). The Basque attachment decisions are taken based on the occurrence frequency of the translations of the Basque (verb-noun) pairs in the English syntactic(More)
In this study, we explore the impact of complex lexical information to solve syntactic ambiguity, including verbal subcategorization in the form of verbal transitivity and verb-noun-case or verb-noun-case-auxiliary relations. The information was obtained from different sources, including a subcategorization dictionary extracted from a Basque corpus, the web(More)
In this chapter we describe a computational grammar for Basque, and the first results obtained using it in the process of automatically acquiring subcategorization information about verbs and their associated sentence elements (arguments and adjuncts). The first part of this chapter (section 1) will be devoted to the description of Basque syntax, and to(More)
The aim of this work is the construction of a syntactically annotated treebank for Basque. In this paper we present first, the basis of the annotation. After examining several options we chose the scheme presented in (Carrol et al., 1998). It follows the EAGLES standards and it is based on the idea of adding to each sentence in the corpus a series of(More)
This paper presents experiments performed on lexical knowledge acquisition in the form of verbal argumental information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used two different statistical filters to acquire the argumental information: Mutual Information, and Fisher’s Exact test.(More)
This paper presents a semantic-driven methodology for the automatic acquisition of verbal models. Our approach relies strongly on the semantic generalizations allowed by already existing resources (e.g. Domain labels, Named Entity categories, concepts in the SUMO ontology, etc). Several experiments have been carried out using comparable corpora in four(More)
This work focuses on data mining applied to the clinical documentation domain. Diagnostic Terms (DTs) are used as keywords to retrieve valuable information from Electronic Health Records (EHRs). Indeed, they are encoded manually by experts following the International Classification of Diseases (ICD). The goal of this work is to explore the aid of text(More)