FreeSpan: frequent pattern-projected sequential pattern mining
Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of still encounters problems when a sequence database is large and/or when sequential patterns to be mined are numerous and/or long. In this paper, we propose a novel sequential pattern mining method, called PrefixSpan (i.e., Prefix-projected Sequential pattern mining), which explores prefix-projection in sequential pattern mining. PrefixSpan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover, prefix-projection substantially reduces the size of projected databases and leads to efficient processing. Our performance study shows that PrefixSpan outperforms both the £ ¤ § ¦ ¨ ¦ ¨-based GSP algorithm and another recently proposed method, FreeSpan, in mining large sequence databases.