Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

@article{Shareghi2016FastSA,
  title={Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees},
  author={Ehsan Shareghi and Matthias Petri and Gholamreza Haffari and Trevor Cohn},
  journal={Transactions of the Association for Computational Linguistics},
  year={2016},
  volume={4},
  pages={477-490}
}
Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500×, despite only incurring a modest increase in construction time and memory usage… CONTINUE READING
Tweets
This paper has been referenced on Twitter 5 times. VIEW TWEETS

Similar Papers

Loading similar papers…