José Ramom Pichel Campos

Learn More
This article describes two systems participating to the TweetLID-2014 competition focused on language detection in tweets. The systems are based on two different strategies: ranked dictionaries and Naive Bayes classifiers. The results show that ranking dictionaries performs better with small training corpora whose language distribution is similar to that of(More)
So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar(More)
Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (i) distinction of similar languages, (ii) detection of multilingualism in a single document, and (iii) identifying the language of short texts. In this paper, we(More)
imaxin|software levamos a cabo um projecto, subsidiado pola Dirección Xeral de I+D+i da Xunta de Galicia, cha-mado " RecursOpentrad: Recursos lingüístico-This work is licensed under a Creative Commons Attribution 3.0 License
Resumen: Los trabajos sobre extracción de equivalentes de traducción a partir de corpus comparables no-paralelos no han sido muy numerosos hasta ahora. La razón principal radica en los pobres resultados obtenidos si los comparamos con los enfo-ques que utilizan corpus paralelos y alineados. El método propuesto en este artículo, basado en el uso de contextos(More)
Resumen: Este artículo describe una estrategia de normalización léxica de pal-abras " out-of-vocabulary " (OOV) en tweets escritos en español. Para corregir OOV incorrectos, el sistema de normalización genera candidatos " in-vocabulary " (IV) que aparecen en diferentes recursos léxicos y selecciona el más adecuado. Nuestro Abstract: This paper describes a(More)
Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for(More)