Bayesian Modeling of Lexical Resources for Low-Resource Settings

  title={Bayesian Modeling of Lexical Resources for Low-Resource Settings},
  author={Nicholas Andrews and Mark Dredze and Benjamin Van Durme and Jason Eisner},
Lexical resources such as dictionaries and gazetteers are often used as auxiliary data for tasks such as part-of-speech induction and named-entity recognition. However, discriminative training with lexical features requires annotated data to reliably estimate the lexical feature weights and may result in overfitting the lexical features at the expense of features which generalize better. In this paper, we investigate a more robust approach: we stipulate that the lexicon is the result of an… 

Figures and Tables from this paper

Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
It is shown how the spellings of known words can help us deal with unknown words in open-vocabulary NLP tasks and beat previous work and establish state-of-the-art results on multiple datasets.
Towards Language Service Creation and Customization for Low-Resource Languages
A service-oriented language infrastructure is introduced, the Language Grid, that enables the automatic creation and customization of new resources from existing ones and realizes new language services by supporting the sharing and combining of language resources.
Morphological Disambiguation of South Sámi with FSTs and Neural Networks
The method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence and requires only minimal resources for South Sámi, which makes it usable and applicable in the contexts of any other endangered language as well.
Neural Particle Smoothing for Sampling from Conditional Sequence Models
This work introduces neural particle smoothing, a sequential Monte Carlo method for sampling annotations of an input string from a given probability model that looks ahead to the end of the input string by means of a right-to-left LSTM.


Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
A novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language, using graph-based label propagation for cross-lingual knowledge transfer.
Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
Taking the min-greedy algorithm as a starting point, this work improves it with several intuitive heuristics and defines a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data.
Wiki-ly Supervised Part-of-Speech Tagging
This paper shows that it is possible to build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary.
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity
A fully Bayesian approach to unsupervised part-of-speech tagging
This model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE.
Reducing Weight Undertraining in Structured Discriminative Learning
This work introduces several new feature bagging methods, in which separate models are trained on subsets of the original features, and combined using a mixture model or a product of experts, which performs better than simply training a single CRF on all the features.
Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
A novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior is developed, providing an elegant and principled means of incorporating lexical characteristics in part-of-speech induction.
Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
We present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing
Using Gazetteers in Discriminative Information Extraction
It is shown that by quarantining gazetteer features and training them in a separate model, then decoding using a logarithmic opinion pool, the authors may achieve much higher accuracy.