Joshua Crowgey

Learn More
We propose to bring together two kinds of linguistic resources—interlinear glossed text (IGT) and a language-independent precision grammar resource—to automatically create precision grammars in the context of language documentation. This paper takes the first steps in that direction by extracting major-constituent word order and case system properties from(More)
We present a case study of the methodology of using information extracted from interlinear glossed text (IGT) to create of actual working HPSG grammar fragments using the Grammar Matrix focusing on one language: Chintang. Though the results are barely measurable in terms of coverage over running text, they nonetheless provide a proof of concept. Our(More)
This paper presents Xigt, an extensible storage format for interlin-ear glossed text (IGT). We review design desiderata for such a format based on our own use cases as well as general best practices, and then explore existing representations of IGT through the lens of those desiderata. We give an overview of the data model and XML serialization of Xigt, and(More)
This paper hypothesizes that transfer-based machine translation systems can be improved by encoding information structure in both the source and target grammars, and preserving information structure in the transfer stage. We explore how information structure can be represented within the HPSG/MRS formalism (Pollard and Sag, 1994; Copestake et al., 2005) and(More)
In this paper, we describe the expansion of the ODIN resource, a database containing many thousands of instances of Interlinear Glossed Text (IGT) for over a thousand languages. A database containing a large number of instances of IGT, which are effectively richly annotated and heuristically aligned bitexts, provides a unique resource for bootstrapping NLP(More)
We explore the interaction of sentential negation and word order in Basque using a small experimental implemented grammar based on the Grammar Matrix (Bender et al., 2002, 2010) to test the analyses. We find that the analysis of free word order (Fokkens, 2010) provided by the Grammar Matrix customization system can be adapted to handle the Basque facts, and(More)
The majority of the world's languages have little to no NLP resources or tools. This is due to a lack of training data (" resources ") over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world's languages. In many cases this involves(More)
While there have been significant improvements in speech and language processing, it remains difficult to bring these new tools to bear on challenges in endangered language documentation. We describe an effort to bridge this gap through Shared Task Evaluation Campaigns (STECs) by designing tasks that are compelling to speech and natural language processing(More)
In this paper I explore the logical range of sentential negation types predicted by the theory of HPSG. I find that typological surveys confirm that attested simple negation strategies neatly line up with the types of lexical material given by assuming Lexical Integrity and standard Phrase Structure Grammar dependencies. I then extend the methodology to(More)