Charles A. Sutton

Learn More
1.1 Introduction Relational data has two characteristics: first, statistical dependencies exist between the entities we wish to model, and second, each entity often has a rich set of features that can aid classification. For example, when classifying Web documents, the page's text provides much information about the class label, but hyperlinks define a(More)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present <i>dynamic conditional random fields (DCRFs)</i>, a generalization of linear-chain conditional random fields (CRFs) in which each time slice(More)
Many tasks involve predicting a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling. They combine the ability of graphical models to compactly model multivariate data with the ability of classification methods to(More)
In information extraction, we often wish to identify all mentions of an entity, such as a person or organization. Traditionally, a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example, if the same word is used multiple times, it is likely to have the same label each(More)
For many large undirected models that arise in real-world applications, exact maximumlikelihood training is intractable, because it requires computing marginal distributions of the model. Conditional training is even more difficult, because the partition function depends not only on the parameters, but also on the observed input, requiring repeated(More)
The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. This could enable new tools for developing code, encouraging reuse, and navigating large projects. In this paper, we build the first giga-token(More)
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton &amp; McCallum,(More)
Hidden Markov models and linear-chain conditional random fields (CRFs) are applicable to many tasks in spoken language processing. In large state spaces, however, training can be expensive, because it often requires many iterations of forward-backward. Beam search is a standard heuristic for controlling complexity during Viterbi decoding, but during(More)
Recent innovations in RFID technology are enabling large-scale cost-effective deployments in retail, healthcare, pharmaceuticals and supply chain management. The advent of mobile or handheld readers adds significant new challenges to RFID stream processing due to the inherent reader mobility, increased noise, and incomplete data. In this paper, we address(More)
Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project&#8217;s coding conventions. However, one(More)