Grzegorz Chrupala

Learn More
Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora. The system is composed of two learning modules which are trained to predict morphological tags and lemmas using the Maximum Entropy classifier. The third module dynamically combines the(More)
We describe a system for the CoNLL-2004 Shared Task on Semantic Role Labeling (Carreras and Màrquez, 2004a). The system implements a two-layer learning architecture to recognize arguments in a sentence and predict the role they play in the propositions. The exploration strategy visits possible arguments bottom-up, navigating through the clause hierarchy.(More)
Tokenization is widely regarded as a solved problem due to the high accuracy that rulebased tokenizers achieve. But rule-based tokenizers are hard to maintain and their rules language specific. We show that highaccuracy word and sentence segmentation can be achieved by using supervised sequence labeling on the character level combined with unsupervised(More)
We present RelationFactory, a highly effective open source relation extraction system based on shallow modeling techniques. RelationFactory emphasizes modularity, is easily configurable and uses a transparent pipelined approach. The interactive demo allows the user to pose queries for which RelationFactory retrieves and analyses contexts that contain(More)
We present novel methods for analysing the activation patterns of RNNs and identifying the types of linguistic structure they learn. As a case study, we use a multi-task gated recurrent network model consisting of two parallel pathways with shared word embeddings trained on predicting the representations of the visual scene corresponding to an input(More)
Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class(More)
Data-driven grammatical function tag assignment has been studied for English using the Penn-II Treebank data. In this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. In addition to tag assignment accuracy and f-scores we also present results of a task-based evaluation. We use three(More)