PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems

  title={PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems},
  author={Marcos Garc{\'i}a and Pablo Gamallo and Iria Gayo and Miguel A. Pousada Cruz},
  journal={Procesamiento del Lenguaje Natural},
The great amount of text produced every day in the Web turned it as one of the main sources for obtaining linguistic corpora, that are further analyzed with Natural Language Processing techniques. On a global scale, languages such as Portuguese —official in 9 countries— appear on the Web in several varieties, with lexical, morphological and syntactic (among others) differences. Besides, a unified spelling system for Portuguese has been recently approved, and its implementation process has… CONTINUE READING


Publications referenced by this paper.
Showing 1-10 of 17 references

The LacioWeb Project: overview and issues in Brazilian Portuguese corpora creation

  • Alúısio, M. Sandra, Gisele M. Pinheiro, Marcelo Finger, M. Graças Volpe Nunes, Stella E. Tagnin.
  • In
  • 2003
Highly Influential
4 Excerpts

Dicionários Electrónicos do Português

  • Eleutério, Samuel, Elisabete Ranchhod, Cristina Mota, Paula Carvalho.
  • Caracteŕısticas e Aplicações. In Actas del VIII…
  • 2003
Highly Influential
5 Excerpts

Vocabulário Ortográfico Comum ( VOC ) : constituição de uma base lexical para a ĺıngua portuguesa

  • Gisele M. Pinheiro, M. Graças Volpe Nunes Marcelo Finger, E Stella
  • 2013

Vocabulário Ortográfico Comum (VOC): constituição de uma base

  • Almeida, Gladis Maria de Barcellos, José Pedro Ferreira, Margarita Correia, Gilvan Müller de Oliveira
  • 2013
2 Excerpts

A construção de recursos lingǘısticocomputacionais para o português do Brasil: o projeto de Unitex-PB

  • Muniz, Marcelo Caetano Martins.
  • Master’s thesis, Instituto de Ciências Matem…
  • 2004

Similar Papers

Loading similar papers…