José João Almeida

Learn More
Languages are born, evolve and, eventually, die. During this evolution their spelling rules (and sometimes the syntactic and semantic ones) change, putting old documents out of use. In Portugal, a pair of political agreements with Brazil forced relevant changes on the way the Portuguese language is written. In this article we will detail these two(More)
This document presents the TerminUM project and the work done in its statistical word aligner workbench (NATools). It shows a variety of alignment methods for parallel corpora and discusses the resulting terminological dictionaries and their use: evaluation of sentence translations; construction of a multi-level navigation system for linguistic studies or(More)
Besides source code, the fundamental source of information about Open Source Software lies in documentation, and other non source code files, like README, INSTALL, or HowTo files, commonly available in the software ecosystem. These documents, written in natural language, provide valuable information during the software development stage, but also in future(More)
One of the first tasks when building a Natural Language application is the detection of the used language in order to adapt the system to that language. This task has been addressed several times. Nevertheless most of these attempts were performed a long time ago when the amount of computer data and the computational power were limited. In this article we(More)
This paper accompanies the demonstration of Camila, an experimental platform for formal software development, rooted in the tradition of constructive speciication methods. The Camila approach is an attempt to make available at software development level the basic problem solving strategy one got used to from school physics | create, experiment and reason on(More)
In our days, the notion, the importance and the significance of parallel corpora is so big that needs no special introduction. Unfortunately, public available parallel corpora is somewhat limited in range. There are big corpora about politics or legislation, about medicine and other specific areas, but we miss corpora for other different areas. Currently(More)
According to recent research, nearly 95 percent of a corporate information is stored in documents. Further studies indicate that companies spent between 6 and 10 percent of their gross revenues printing and distributing documents in several ways: web and cdrom publishing, database storage and retrieval and printing. In this context documents exist in some(More)
Neste trabalho apresentamos o projecto Procura-PALvras (P-PAL) cujo principal objectivo é desenvolver uma ferramenta electrónica que disponibilize informação sobre ı́ndices psicolingúısticos objectivos e subjectivos de palavras do Português Europeu (PE). O P-PAL será disponibilizado gratuitamente à comunidade cient́ıfica num formato amigável a partir de um(More)
The analysis of business/financial news has become a popular area of research because of the possibility to infer the future prospects of companies, economies and economic actors in general on information contained in the media. The classical approaches rely upon a "coarse" polarity classification of a news story, however this may not be an optimal solution(More)