José João Almeida

Learn More
Languages are born, evolve and, eventually, die. During this evolution their spelling rules (and sometimes the syntactic and semantic ones) change, putting old documents out of use. In Portugal, a pair of political agreements with Brazil forced relevant changes on the way the Portuguese language is written. In this article we will detail these two(More)
This document presents the TerminUM project and the work done in its statistical word aligner workbench (NATools). It shows a variety of alignment methods for parallel corpora and discusses the resulting terminological dictionaries and their use: evaluation of sentence translations; construction of a multi-level navigation system for linguistic studies or(More)
According to recent research, nearly 95 percent of a corporate information is stored in documents. Further studies indicate that companies spent between 6 and 10 percent of their gross revenues printing and distributing documents in several ways: web and cdrom publishing, database storage and retrieval and printing. In this context documents exist in some(More)
Resumo Neste trabalho apresentamos o projecto Procura-PALvras (P-PAL) cujo principal objectivó e de-senvolver uma ferramenta electrónica que disponibilize informação sobré ındices psicolinguísticos ob-jectivos e subjectivos de palavras do Português Europeu (PE). O P-PAL será disponibilizado gratuita-mentè a comunidade científica num formato amigável a(More)
Resumen: Los corpora paralelos son fuentes ricas en recursos de traducción. Este documento presenta una metodología para la extracción de sintagmas nominales bil-ingües (candidatos terminológicos) a partir de corpora paralelos, utilizando reglas de traducción. Los modelos propuestos en este trabajo especifican las alteraciones en el orden de las palabras(More)
This paper describes NATools, a toolkit to process, analyze and extract translation resources from Parallel Corpora. It includes tools like a sentence-aligner, a probabilistic translation dictionaries extractor, word-aligner, a corpus server, a set of tools to query corpora and dictionaries, as well as a set of tools to extract bilingual resources.
In this paper we describe how Dicionário-Aberto, an online dictionary for the Portuguese language, is being used as the base to construct diverse resources that are relevant in the processing of the Portuguese language. We will briefly present its history, explaining how we got here. Then, we will describe the resources already available to download and(More)
In this document we present an open source Portuguese text to speech. Our first goal is to provide a flexible way to extend it, using a generic way to convert Portuguese words on SAMPA phonemes, and consult dictionaries only on exceptions examples. The Text-to-Speech is compound of five layers, each one based on simple rules in a way to be easily tuned. In(More)