Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks

Abstract

The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of tree-structurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees.

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@inproceedings{Martens2010VarroAA, title={Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks}, author={Scott Martens}, booktitle={COLING}, year={2010} }