Thomas Müller

Learn More
This paper describes the IMS-SZEGED-CIS contribution to the SPMRL 2013 Shared Task. We participate in both the constituency and dependency tracks, and achieve state-of-theart for all languages. For both tracks we make significant improvements through high quality preprocessing and (re)ranking on top of strong baselines. Our system came out first for both(More)
We present labeled morphological segmentation—an alternative view of morphological processing that unifies several tasks. We introduce a new hierarchy of morphotactic tagsets and CHIPMUNK, a discriminative morphological segmentation system that, contrary to previous work, explicitly models morphotactics. We show improved performance on three tasks for all(More)
We present LEMMING, a modular loglinear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and lemmata and does not rely on morphological dictionaries or analyzers. LEMMING sets the new state of the art in token-based statistical(More)
We investigate a language model that combines morphological and shape features with a Kneser-Ney model and test it in a large crosslingual study of European languages. Even though the model is generic and we use the same architecture and features for all languages, the model achieves reductions in perplexity for all 21 languages represented in the Europarl(More)
Branching theories are popular frameworks for modeling objective indeterminism in the form of a future of open possibilities. In such theories, the notion of a history plays a crucial role: it is both a basic ingredient in the axiomatic definition of the framework, and it is used as a parameter of truth in semantics for languages with a future tense.(More)
In this paper we propose a method to increase dependency parser performance without using additional labeled or unlabeled data by refining the layer of predicted part-of-speech (POS) tags. We perform experiments on English and German and show significant improvements for both languages. The refinement is based on generative split-merge training for Hidden(More)
When assayed for their capacity to inhibit azo-initiated peroxidation of linoleic acid in a water/chlorobenzene two-phase system, tellurium-containing 3-pyridinols were readily regenerable by N-acetylcysteine contained in the aqueous phase. The best inhibitors quenched peroxyl radicals more efficiently than alpha-tocopherol, and the duration of inhibition(More)
We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing outof-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4% overall(More)