BACO - A large database of text and co-occurrences

@inproceedings{Sarmento2006BACOA,
  title={BACO - A large database of text and co-occurrences},
  author={Lu{\'i}s Sarmento},
  booktitle={LREC},
  year={2006}
}
In this paper we introduce a public resource named BACO (Base de Co-Ocorrências), a very large textual database built from the WPT03 collection, a publicly available crawl of the whole Portuguese web in 2003. BACO uses a generic relational database engine to store 1.5 million web documents in raw text (more than 6GB of plain text), corresponding to 35 million sentences, consisting of more than 1000 million words. BACO comprises four lexicon tables, including a standard single token lexicon, and… CONTINUE READING
Highly Cited
This paper has 20 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.

References

Publications referenced by this paper.
Showing 1-6 of 6 references

A expansão de conjuntos de co - hipónimos a partir de colecções de grandes dimensões de texto em Português . Actas de 1 a Conferência em Metodologias de Investigação Científica

Luís Sarmento
2006
View 1 Excerpt

A expansão de conjuntos de co-hipónimos a partir de colecções de grandes dimensões de texto em Português

Sarmento, Luís.
Actas de 1a Conferência em Metodologias de Investigação Científica. Janeiro 2006. Porto, Portugal. • 2006

Similar Papers

Loading similar papers…