The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch

@inproceedings{Oostdijk2013TheCO,
  title={The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch},
  author={Nelleke Oostdijk and Martin Reynaert and V{\'e}ronique Hoste and Ineke Schuurman},
  booktitle={Essential Speech and Language Technology for Dutch},
  year={2013}
}
The construction of a large and richly annotated corpus of written Dutch was identified as one of the priorities of the STEVIN programme. Such a corpus, sampling texts from conventional and new media, is invaluable for scientific research and application development. The present chapter describes how in two consecutive STEVIN-funded projects, viz. D-Coi and SoNaR, the Dutch reference corpus was developed. The construction of the corpus has been guided by (inter)national standards and best… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 42 CITATIONS

BERTje: A Dutch BERT Model

VIEW 7 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Do speech registers differ in the predictability of words?

M. Bentumi, Lapo Boschi, Antonello Boschi, M. Ernestusi
  • 2019
VIEW 3 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

References

Publications referenced by this paper.
SHOWING 1-10 OF 55 REFERENCES

Annotation Scheme for Marking Spatial Expressions in Natural Language

SpatialML
  • MITRE
  • 2007
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

TimeML annotation guidelines, version 1.2.1

R. Sauri, J. Littman, +3 authors J. Pustejovsky
  • http://timeml.org/site/publications/specs.html
  • 2006
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

The Proposition Bank: An Annotated Corpus of Semantic Roles

  • Computational Linguistics
  • 2005
VIEW 9 EXCERPTS
HIGHLY INFLUENTIAL

Assessing Agreement on Classification Tasks: The Kappa Statistic

  • Computational Linguistics
  • 1996
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL