We present a technique for constructing random elds from a set of training samples. The learning paradigm builds increasingly complex elds by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the eld and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random eld models and techniques introduced in this paper di er from those common to much of the computer vision literature in that the underlying random elds are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches including decision trees and Boltzmann machines are given. As a demonstration of the method, we describe its application to the problem of automatic word classi cation in natural language processing.