Learn More
In this article, we present a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. Instead of relying on orthographic clues, the proposed system is able to detect(More)
We describe a language-independent, flexible , and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as a collocation, and can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a good recall(More)
  • Paul Kiparsky, Ash Asudeh, Andrew Carstairs, Andrew Garrett, Pekka Sammallahti, Jan Strunk +1 other
  • 2005
1.1 Meillet's formal concept of grammaticalization According to the neogrammarians and de Saussure, all linguistic change is either sound change, analogy, or borrowing. 1 Meillet (1912) identified a class of changes that don't fit into any of these three categories. Like analogical changes, they are endogenous innovations directly affecting morphology and(More)
  • Ivan A Sag, Emily Bender, Grev Corbett, Bill Croft, Bruno Estigarribia, Charles Fillmore +10 others
  • 2007
1 Introduction This paper deals with a number issues having to do with locality in natural language. Locality of selection is the problem of delimiting what syntactic and semantic information lexical items select. Related issues include the proper analysis of idiomatic expressions, control of overt pronominals, and cross-linguistic variation in lexical(More)
Prepositions are highly polysemous. Yet, little effort has been spent to develop language-specific annotation schemata for preposition senses to systematically represent and analyze the polysemy of prepositions in large corpora. In this paper, we present an annotation schema for preposition senses in German. The annotation schema includes a hierarchical(More)
The realization of singular count nouns without an accompanying determiner inside a PP (determinerless PP, bare PP, Preposition-Noun Combination) has recently attracted some interest in computational linguistics. Yet, the relevant factors for determiner omission remain unclear , and conditions for determiner omission vary from language to language. We(More)
In this paper, we describe a new unsupervised sentence boundary detection system and present a comparative study evaluating its performance against different systems found in the literature that have been used to perform the task of automatic text segmentation into sentences for English and Portuguese documents. The results achieved by this new approach(More)
09:10-09:45 Jussi Karlgren: Constructions, patterns, and finding features more sophisticated than term occurrence in text (Keynote). PREFACE A construction is a recurring, or otherwise noteworthy congregation of linguistic entities. Examples include collocations (" hermetically sealed "), (idiomatic) expressions with fixed constituents (" kick the bucket(More)
In this paper we show that recently developed algorithms for unsupervised word segmentation can be a valuable tool for the documentation of endangered languages. We applied an unsupervised word segmentation algorithm based on a nested Pitman-Yor language model to two austronesian languages, Wooi and Waima'a. The algorithm was then modified and(More)