Inducing Features of Random Fields

  title={Inducing Features of Random Fields},
  author={Stephen Della Pietra and Vincent J. Della Pietra and John D. Lafferty},
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative… 

Figures from this paper

Contrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields

This study proposes a fast feature evaluation algorithm called Contrastive Feature Induction (CFI), which only evaluates a subset of features that involve both variables with high signals and errors, and is an efficient approximation of gradient-based evaluation methods.

Efficiently Inducing Features of Conditional Random Fields

This paper presents an efficient feature induction method for CRFs founded on the principle of iteratively constructing feature conjunctions that would significantly increase conditional log-likelihood if added to the model.

Markov Network Structure Learning: A Randomized Feature Generation Approach

This paper combines a data-driven, specific-to-general search strategy with randomization to quickly generate a large set of candidate features that all have support in the data and uses weight learning, with L1 regularization, to select a subset of generated features to include in the model.

Learning Flexible Features for Conditional Random Fields

This paper presents a model capable of learning higher-order structures using a random field of parameterized features, which can be functions of arbitrary combinations of observations, labels and auxiliary hidden variables and presents a simple induction scheme to learn these features.

Learning Multivariate Distributions by Competitive Assembly of Marginals

A new framework for learning high-dimensional multivariate probability distributions from estimated marginals, motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes is presented.

Learning Trans-Dimensional Random Fields with Applications to Language Modeling

  • Bin WangZhijian OuZ. Tan
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2018
This work proposes a probabilistic model, called the trans-dimensional random field (TRF), and develops an effective training algorithm, called augmented SA, which jointly estimates the model parameters and normalizing constants while using trans- dimensional mixture sampling to generate observations of different dimensions.

Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories

A Probabilistic grammar-Markov model (PGMM) which couples probabilistic context free grammars and Markov random fields is introduced which is generally comparable with the current state of the art, and the inference is performed in less than five seconds.

A Comparison of Algorithms for Maximum Entropy Parameter Estimation

A number of algorithms for estimating the parameters of ME models are considered, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods.

Efficient Training of Conditional Random Fields

This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced probabilistic model for labelling and segmenting sequential data, and hypothesises that general numerical optimisation techniques result in improved performance over iterative scaling algorithms for training CRFs.

Learning Symmetric Relational Markov Random Fields

This work shows that for a particular class of rMRFs, which have inhe rent symmetry, this computational procedure is equivalent to synchronous loopy belief propa gation and yields a dramatic speedup in inference time, which is used to learn such symmetric rMRF’s from evidence in an efficient way.



Best-first Model Merging for Hidden Markov Model Induction

A new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy, and how the algorithm was incorporated in an operational speech understanding system, where it was combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.

A Learning Algorithm for Boltzmann Machines

$I$-Divergence Geometry of Probability Distributions and Minimization Problems

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and

A Note on Approximations to Discrete Probability Distributions

The Power of Amnesia

The algorithm is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small and using less than 3000 states the model's performance is far superior to that of fixed memory models with similar number of states.

Higher-Order Boltzmann Machines

This work presents the Boltzmann machine, a nonlinear network of stochastic binary processing units, which overcame the limitations of previous network models by introducing hidden units and shows how they incorporate internal representations.

A Variational Method for Estimating the Parameters of MRF from Complete or Incomplete Data

We introduce a new method (to be referred to as the variational method, VM) for estimating the parameters of Gibbs distributions with random variables ("spins") taking values in a Euclidean space Rn,

A Maximum Entropy Approach to Natural Language Processing

A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.

Class-Based n-gram Models of Natural Language

This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

Noncausal Gauss Markov random fields: Parameter structure and estimation

The parameter structure of noncausal homogeneous Gauss Markov random fields (GMRF) defined on finite lattices is studied and an efficient procedure for ML estimation is described.