Extracting tree fragments in linear average time


This report details the implementation of a fragment extraction algorithm using an average case linear time tree kernel. Given a treebank, the algorithm extracts all fragments that occur at least twice, along with their frequency. Evaluation shows a -fold speedup over a quadratic fragment extraction implementation. Additionally, we add support for trees with discontinuous constituents.

