Learn More
We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategoriza-tion frames is accomplished by a further application of EM, and(More)
We present a new approach to stochastic modeling of constraint-based grammars that is based on log-linear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an(More)
We describe a system for semantic role assignment built as part of the Senseval III task, based on an off-the-shelf parser and Maxent and Memory-Based learners. We focus on generalisation using several similarity measures to increase the amount of training data available and on the use of EM-based clustering to improve role assignment. Our final score is(More)
A statistical estimator attempts to guess an unknown probability distribution by analyzing a sample from this distribution. One desirable property of an estimator is that its guess is increasingly likely to get arbitrarily close to the actual distribution as the sample size increases. This property is called consistency. Data Oriented Parsing (DOP) employs(More)
1 Introduction The paper gives a brief review of the expectation-maximization algorithm (Dempster, Laird, and Rubin 1977) in the comprehensible framework of discrete mathematics. In Section 2, two prominent estimation methods, the relative-frequency estimation and the maximum-likelihood estimation are presented. Section 3 is dedicated to the(More)
This paper presents a framework for developing and training statistical grammar models for the acquisition of lexicon information. Util-ising a robust parsing environment and mathematically well-deened unsupervised training methods, the framework enables us to induce lexicon information from text corpora. Particular strengths of the approach concern (i) the(More)
We describe a statistical approach to semantic role labelling that employs only shallow information. We use a Maximum Entropy learner, augmented by EM-based clustering to model the fit between a verb and its argument candidate. The instances to be classified are sequences of chunks that occur frequently as arguments in the training corpus. Our best model(More)
This paper presents the use of probabilistic class-based lexica for disambiguation in target-word selection. Our method employs minimal but precise contextual information for disam-biguation. That is, only information provided by the target-verb, enriched by the condensed information of a probabilistic class-based lexicon , is used. Induction of classes and(More)
An approach to automatic detection of syllable structure is presented. We demonstrate a novel application of EM-based clustering to multivariate data, exempliied by the induction of 3-and 5-dimensional probabilis-tic syllable classes. The qualitative evaluation shows that the method yields phonologically meaningful syllable classes. We then propose a novel(More)