Building Large Corpora from the Web Using a New Efficient Tool Chain

@inproceedings{Schfer2012BuildingLC,
  title={Building Large Corpora from the Web Using a New Efficient Tool Chain},
  author={Roland Sch{\"a}fer and Felix Bildhauer},
  booktitle={LREC},
  year={2012}
}
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been actively researched. Prominently, the WaCky initiative has provided both theoretical results and a set of web corpora for selected European languages. We present a software toolkit for web corpus construction and a set of siginificantly larger corpora (up to… CONTINUE READING

10 Figures & Tables

Topics

Statistics

0102030201320142015201620172018
Citations per Year

85 Citations

Semantic Scholar estimates that this publication has 85 citations based on the available data.

See our FAQ for additional information.