Learn More
This paper introduces a new approach to morpho-syntactic analysis through Humor 99 (High-speed Unification Mo.rphology), a reversible and unification-based morphological analyzer which has already been integrated with a variety of industrial applications. Humor 99 successfully copes with problems of agglutinative (e.g. Hungarian, Turkish, Estonian) and(More)
This paper introduces a new approach to translation memories. The proposed translation technology uses linguistic analysis (morphology and parsing) to determine similarity between two source-language segments, and attempts to assemble a sensible transltion using translations of source-language chunks if the entire source segment was not found. This is(More)
We apply statistical methods to perform automatic extraction of Hungarian collocations from corpora. Due to the complexity of Hungarian morphology, a complex resource preparation tool chain has been developed. This tool chain implements a reusable and, in principle, language independent framework. In the first part, the paper describes the tool chain(More)
Texts acquired from recognition sources—continuous speech/handwriting recognition and OCR—generally have three types of errors regardless of the characteristics of the source in particular. The output of the recognition process may be (1) poorly segmented or not segmented at all; (2) containing underspecified symbols (where the recognition process can only(More)
This paper introduces a context-sensitive electronic dictionary that provides translations for any piece of text displayed on a computer screen, without requiring user interaction. This is achieved through a process of three phases: text acquisition from the screen, morpho-syntactic analysis of the context of the selected word, and the dictionary lookup. As(More)
This paper describes an experiment on extracting Hungarian multi-word lexemes from a corpus, using statistical methods. Corpus preparation—the addition of POS tags and stems—was done automatically. From the corpus, 〈verb+noun+casemark〉 patterns were extracted as collocation candidates. Evaluation shows that the statistical methods used by Villada Moirón(More)
  • 1