Closing the Gap: Sequence Mining at Scale

@article{Beedkar2015ClosingTG,
  title={Closing the Gap: Sequence Mining at Scale},
  author={Kaustubh Beedkar and Klaus Berberich and Rainer Gemulla and Iris Miliaraki},
  journal={ACM Trans. Database Syst.},
  year={2015},
  volume={40},
  pages={8:1-8:44}
}
Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are sufficiently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this article, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce. MG-FSM can handle so-called “gap constraints”, which can be used to limit the… CONTINUE READING
2 Citations
9 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-2 of 2 extracted citations

References

Publications referenced by this paper.
Showing 1-9 of 9 references

MG-FSM source code

  • MG-FSM
  • http://dws.informatik.unimannheim.de/en/resources…
  • 2014
Highly Influential
5 Excerpts

Web 1T 5-gram version 1

  • Thorsten Brants, Alex Franz.
  • Linguistic Data Consortium, Philadelphia. https…
  • 2006
Highly Influential
2 Excerpts

Similar Papers

Loading similar papers…