PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth


Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of which may substantially reduce the number of combinations to be examined. However, still encounters problems when a sequence database is large and/or when sequential patterns to be mined are numerous and/or long. In this paper, we propose a novel sequential pattern mining method, called PrefixSpan (i.e., Prefix-projected Sequential pattern mining), which explores prefixprojection in sequential pattern mining. PrefixSpan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover, prefix-projection substantially reduces the size of projected databases and leads to efficient processing. Our performance study shows that PrefixSpan outperforms both the -based GSP algorithm and another recently proposed method, FreeSpan, in mining large sequence databases.

Extracted Key Phrases

5 Figures and Tables

Citations per Year

1,424 Citations

Semantic Scholar estimates that this publication has 1,424 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Pei2001PrefixSpanMS, title={PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth}, author={Jian Pei and Jiawei Han and Behzad Mortazavi-Asl and Helen Pinto and Qiming Chen and Umeshwar Dayal and Mei-Chun Hsu}, year={2001} }