Building a Korean Web Corpus for Analyzing Learner Language Sketching Techniques for Large Scale Nlp Workshop Program Building a Korean Web Corpus for Analyzing Learner Language Nowac: a Large Web-based Corpus for Norwegian

Abstract

In this paper we introduce the first version of noWaC, a large web-based corpus of Bokmål Norwegian currently containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain. The procedure used to collect the noWaC corpus is largely based on the techniques described by… (More)

Topics

9 Figures and Tables

Cite this paper

@inproceedings{Bernardini2010BuildingAK, title={Building a Korean Web Corpus for Analyzing Learner Language Sketching Techniques for Large Scale Nlp Workshop Program Building a Korean Web Corpus for Analyzing Learner Language Nowac: a Large Web-based Corpus for Norwegian}, author={Silvia Bernardini and Emiliano Ra{\'u}l Guevara and Sun-Hee Lee and Amit Goyal and Jagadeesh Jagaralamudi}, year={2010} }