José Ramom Pichel Campos

Learn More
This article describes two systems participating to the TweetLID-2014 competition focused on language detection in tweets. The systems are based on two different strategies: ranked dictionaries and Naive Bayes classifiers. The results show that ranking dictionaries performs better with small training corpora whose language distribution is similar to that of(More)
Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (i) distinction of similar languages, (ii) detection of multilingualism in a single document, and (iii) identifying the language of short texts. In this paper, we(More)
So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar(More)
À hora de desenvolver muitas ferramentas estat́ısticas de Processamento da Linguagem Natural tornase essencial a utilização de grandes quantidades de dados. Para salvar a limitação da escassez de recursos computacionais para ĺınguas minorizadas como o galego é necessário desenhar novas estratégias. No caso do galego, importantes romanistas têm teorizado que(More)
So far, research on extraction of word translations from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed contexts generated from external bilingual dictionaries, allows us to achieve results(More)
The “Medieval Galician Computational Treasure” is a research project developed in the ILG (Institute of Galician Language) (coordinated by Xavier Varela and in agreement with the DXPL > SXPL -Linguistic Policy General Secretariatof the Galician Government) and is accessible through the TMILG corpus (http://ilg.usc.es/tmilg). In total there are more than(More)
This paper outlines a strategy to build new bilingual dictionaries from existing resources. The method is based on two main tasks: first, a new set of bilingual correspondences is generated from two available bilingual dictionaries. Second, the generated correspondences are validated by making use of a bilingual lexicon automatically extracted from(More)