#### Filter Results:

- Full text PDF available (283)

#### Publication Year

1990

2017

- This year (16)
- Last 5 years (74)
- Last 10 years (174)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- John D. Lafferty, Andrew McCallum, Fernando Pereira
- ICML
- 2001

We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid… (More)

Recent approaches to text classification have used two different first-order probabilistic models for classification , both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey and Croft 1996; Koller and Sahami 1997). Others… (More)

- Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom M. Mitchell
- Machine Learning
- 2000

This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We… (More)

- Andrew McCallum, Dayne Freitag, Fernando Pereira
- ICML
- 2000

Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text segmentation and information extraction. In these cases, the observations are usually mod-eled as multinomial distributions over a discrete vocabulary, and the HMM… (More)

- Xuerui Wang, Andrew McCallum
- KDD
- 2006

This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the… (More)

- Andrew McCallum, Kamal Nigam, Lyle H. Ungar
- KDD
- 2000

Many important problems involve clustering large datasets. Although naive implementations of clustering are computa-tionally expensive, there are established efficient techniques for clustering when the dataset has either (1) a limited number of clusters, (2) a low feature dimensionality, or (3) a small number of data points. However, there has been much… (More)

- Andrew McCallum, Kamal Nigam
- ICML
- 1998

This paper shows how a text classifier's need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool for explicitly estimating document density when selecting examples for labeling. Then active learning is combined… (More)

This paper proposes the use of maximum en-tropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language mod-eling, part-of-speech tagging, and text segmen-tation. The underlying principle of maximum entropy is that without external knowledge,… (More)

- Nicholas Roy, Andrew McCallum
- ICML
- 2001

This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expected future error is intractable. Our approach is made feasible… (More)

- Charles A. Sutton, Andrew McCallum
- Foundations and Trends in Machine Learning
- 2012

R in sample Vol. xx, No xx (xxxx) 1–87 c xxxx xxxxxxxxx DOI: xxxxxx Abstract Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graph-ical modeling, combining the ability of graphical models to compactly model… (More)