Hannah Kermes

Learn More
This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the(More)
In recent years, there has been rising interest to using evidence derived from automatic syntactic analysis in large-scale corpus studies. Ideally, of course, corpus linguists would prefer to have access to the wealth of structural and featural information provided by a full parser based on a complex grammar formalism. However, to date such parsers achieve(More)
We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665–1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic(More)
Our approach follows the work of Eckle-Kohler (1999) who used a regular grammar to extract lexicographic information from text corpora. We employ a system that allows to improve her querybased grammar especially with respect to recall and speed without reducing accuracy. In contrast to Eckle-Kohler (1999), we do not attempt to parse a whole sentence or(More)
This article deals with techniques for lexical acquisition which allow lexicographers to extract evidence for fine-grained syntactic descriptions of words from corpora. The extraction tools are applied to partially parsed text corpora, and aim to provide the lexicographer with easy to use syntactically pre-classified evidence. As an example we extracted(More)