Learn More
We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that(More)
We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we refer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two(More)
We present two new algorithms for online learning in reproducing kernel Hilbert spaces. Our first algorithm, ILK (implicit online learning with kernels), employs a new, implicit update technique that can be applied to a wide variety of convex loss functions. We then introduce a bounded memory version, SILK (sparse ILK), that maintains a compact(More)
Titterington proposed a recursive parameter estimation algorithm for finite mixture models. However, due to the well known problem of singularities and multiple maximum, minimum and saddle points that are possible on the likelihood surfaces, convergence analysis has seldom been made in the past years. In this paper, under mild conditions, we show the global(More)
An EM-type of recursive estimation algorithm is formulated in the DFT domain for joint estimation of time-varying parameters of distortion channel and additive noise from online degraded speech. Speech features are estimated from the posterior estimates of short-time speech power spectra in an on-the-fly fashion. Experiments were performed on(More)
This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence(More)
We propose a novel information theoretic approach for semi-supervised learning of conditional random fields that defines a training objective to combine the conditional likelihood on labeled data and the mutual information on unlabeled data. In contrast to previous minimum conditional entropy semi-supervised discrimi-native learning methods, our approach is(More)
We present a method for computer-assisted authorship attribution based on character-level Ò-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language(More)
We present a simple method for language independent and task independent text categoriza-tion learning, based on character-level n-gram language models. Our approach uses simple information theoretic principles and achieves effective performance across a variety of languages and tasks without requiring feature selection or extensive pre-processing. To(More)