• Publications
  • Influence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TLDR
This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
A comparison of event models for naive bayes text classification
TLDR
It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Text Classification from Labeled and Unlabeled Documents using EM
TLDR
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.
Modeling Relations and Their Mentions without Labeled Text
TLDR
A novel approach to distant supervision that can alleviate the problem of noisy patterns that hurt precision by using a factor graph and applying constraint-driven semi-supervision to train this model without any knowledge about which sentences express the relations in the authors' training KB.
Maximum Entropy Markov Models for Information Extraction and Segmentation
TLDR
A new Markovian sequence model is presented that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences.
Automating the Construction of Internet Portals with Machine Learning
TLDR
New research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies are described.
Topics over time: a non-Markov continuous-time model of topical trends
TLDR
An LDA-style topic model is presented that captures not only the low-dimensional structure of data, but also how the structure changes over time, showing improved topics, better timestamp prediction, and interpretable trends.
An Introduction to Conditional Random Fields for Relational Learning
TLDR
A solution to this problem is to directly model the conditional distribution p(y|x), which is sufficient for classification, and this is the approach taken by conditional random fields.
Optimizing Semantic Coherence in Topic Models
TLDR
A novel statistical topic model based on an automated evaluation metric based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
An Introduction to Conditional Random Fields
TLDR
This survey describes conditional random fields, a popular probabilistic method for structured prediction, and describes methods for inference and parameter estimation for CRFs, including practical issues for implementing large-scale CRFs.
...
1
2
3
4
5
...