Contextual Dependencies in Unsupervised Word Segmentation

@inproceedings{Goldwater2006ContextualDI,
  title={Contextual Dependencies in Unsupervised Word Segmentation},
  author={Sharon Goldwater and Thomas L. Griffiths and Mark Johnson},
  booktitle={ACL},
  year={2006}
}
  • Sharon Goldwater, Thomas L. Griffiths, Mark Johnson
  • Published in ACL 2006
  • Computer Science
  • Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 160 CITATIONS

    Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars

    VIEW 7 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Joint Learning of Chinese Words, Terms and Keywords

    VIEW 7 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Web scale NLP: a case study on url word breaking

    VIEW 5 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Modeling human performance in statistical word segmentation

    VIEW 7 EXCERPTS
    CITES RESULTS & METHODS

    Nonparametric Word Segmentation for Machine Translation

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Type-Based MCMC

    VIEW 6 EXCERPTS
    HIGHLY INFLUENCED

    Bayesian Unsupervised Word Segmentation with Hierarchical Language Modeling

    • Daichi Mochihashi Takeshi Yamada Naonori
    • 2009
    VIEW 4 EXCERPTS
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2006
    2020

    CITATION STATISTICS

    • 24 Highly Influenced Citations

    • Averaged 8 Citations per year from 2017 through 2019