Learn More
R in sample Vol. xx, No xx (xxxx) 1–87 c xxxx xxxxxxxxx DOI: xxxxxx Abstract Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graph-ical modeling, combining the ability of graphical models to compactly model(More)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present <i>dynamic conditional random fields (DCRFs)</i>, a generalization of linear-chain conditional random fields (CRFs) in which each time slice(More)
1.1 Introduction Relational data has two characteristics: first, statistical dependencies exist between the entities we wish to model, and second, each entity often has a rich set of features that can aid classification. For example, when classifying Web documents, the page's text provides much information about the class label, but hyperlinks define a(More)
In information extraction, we often wish to identify all mentions of an entity, such as a person or organization. Traditionally , a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example, if the same word is used multiple times, it is likely to have the same label each(More)
The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. This could enable new tools for developing code, encouraging reuse, and navigating large projects. In this paper, we build the first giga-token(More)
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton &amp; McCallum,(More)
Hidden Markov models and linear-chain conditional random fields (CRFs) are applicable to many tasks in spoken language processing. In large state spaces, however, training can be expensive, because it often requires many iterations of forward-backward. Beam search is a standard heuristic for controlling complexity during Viterbi decoding, but during(More)
Discriminative probabilistic models are very popular in NLP because of the latitude they afford in designing features. But training involves complex trade-offs among weights, which can be dangerous: a few highly-indicative features can swamp the contribution of many individually weaker features, causing their weights to be undertrained. Such a model is less(More)