Unsupervised Word Segmentation in Context

@inproceedings{Synnaeve2014UnsupervisedWS,
  title={Unsupervised Word Segmentation in Context},
  author={Gabriel Synnaeve and Isabelle Dautriche and Benjamin B{\"o}rschinger and Mark Johnson and Emmanuel Dupoux},
  booktitle={COLING},
  year={2014}
}
This paper extends existing word segmentation models to take non-linguistic context into account. It improves the token F-score of a top performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from a Latent Dirichlet Allocation model as a proxy for “activities” contexts, to label the Providence corpus. We present Adaptor Grammar models that use… CONTINUE READING