Unsupervised Learning of Morphology: Survey, Model, Algorithm and Experiments

This thesis contains work on a specific problem in field of Language Technology, can a computer extract a description of word conjugation in a natural language using only written text in the language, and how this problem can be used in further morphological analysis.

Translation of "It" in a Deep Syntax Framework

A discriminative translation model of the English personal pronoun it is designed, which is then integrated into the TectoMT deep syntax MT framework and outperforms the original solution in 8.5% sentences containing it.

Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon

This paper provides a complete approach to lexical knowledge acquisition of verbal constructions from an untagged news corpus and claims that a description can be more or less fine-grained while keeping the same accuracy and validity.

Models and empirical data for the production of referring expressions

This paper introduces a special issue of Language, Cognition and Neuroscience dedicated to Production of Referring Expressions: Models and Empirical Data, focusing on models of reference production

Language ID in the Context of Harvesting Language Data off the Web

It is argued that language ID is far from solved when one considers input spanning not dozens of languages, but rather hundreds to thousands, a number that one approaches when harvesting language data found on the Web.

Learning Phrasal Categories

This work learns clusters of contextual annotations for non-terminals in the Penn Treebank to try to automate the process -- to learn the "right" combination automatically.

Machine Translation

This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely PhraseBased Statistical MT, together with the major competing paradigms used in MT research and

The Lab offers a number of practical projects in Natural Language Processing (NLP), focusing on processing of Hebrew, with the end result a relatively large-scale, well-documented and efficient software package.

Aggressive Morphology for Robust Lexical Coverage

A system of approximately 1200 morphological rules is used to extend a core lexicon to provide lexical coverage that exceeds that of a lexicon of 80,000 words or 150,000 word forms.

Left-To-Right Parsing and Bilexical Context-Free Grammars

Evidence that left-to-right parsing cannot be realised within acceptable time-bounds if the so called correct-prefix property is to be ensured is provided.

Large-scale Controlled Vocabulary Indexing for Named Entities

A large-scale controlled vocabulary indexing system that covers almost 70,000 named entity topics, and applies to documents from thousands of news publications, is described.

    A Divide-and-Conquer Strategy for Shallow Parsing of German Free Texts Giinter Neumann, Christian Braun A Hybrid Approach for Named Entity and Sub-Type Tagging Rohini Srihari

    Language Independent Morphological Analysis Tatsuo Yamashita, Yuji Matsumoto Jakub Piskorski Wei Li

