Introducing and evaluating ukWaC , a very large web-derived corpus of English

@inproceedings{Ferraresi2008IntroducingAE,
  title={Introducing and evaluating ukWaC , a very large web-derived corpus of English},
  author={Adriano Ferraresi and Eros Zanchetta and Marco Baroni and Silvia Bernardini},
  year={2008}
}
In this paper we introduce ukWaC, a large corpus of English constructed by crawling the .uk Internet domain. The corpus contains more than 2 billion tokens and is one of the largest freely available linguistic resources for English. The paper describes the tools and methodology used in the construction of the corpus and provides a qualitative evaluation of its contents, carried out through a vocabularybased comparison with the BNC. We conclude by giving practical information about availability… CONTINUE READING
Highly Influential
This paper has highly influenced 34 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 243 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.

Citations

Publications citing this paper.
Showing 1-10 of 170 extracted citations

243 Citations

02040'10'13'16'19
Citations per Year
Semantic Scholar estimates that this publication has 243 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 13 references

Introduction to the Special Issue on the Web as Corpus

Computational Linguistics • 2003
View 4 Excerpts
Highly Influenced

Characterizing Genres of Web Pages: Genre Hybridism and Individualization

2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07) • 2007
View 3 Excerpts
Highly Influenced

Longman Grammar of Spoken and Written English

Computational Linguistics • 2001
View 3 Excerpts
Highly Influenced

Building a very large corpus of English obtained by web crawling: ukWaC

A. Ferraresi.
Master’s thesis, University of Bologna. Retrieved January 28, 2008 from http://wacky.sslmit.unibo.it. • 2007
View 1 Excerpt

Creating and using web corpora

M. Thelwall.
International Journal of Corpus Linguistics, 10(4):517–541. • 2005
View 1 Excerpt

Similar Papers

Loading similar papers…