Learn More
STRING is an hybrid statistical and rule-based natural language processing chain for Portuguese. STRING has a modular structure and performs all basic text processing tasks, namely tokenization and text segmentation, part-of-speech tagging, morphosyntactic disam-biguation, shallow parsing (chunking) and deep parsing (dependency extraction). STRING performs(More)
Acknowledgements I would like to thank my supervisor Professor Nuno João Neves Mamede and co-advisor Professor Jorge Manuel Evangelista Baptista for their friendship, guidance and wisdom while always being critic of my work. This outcome would not be possible without their invaluable trust and support. I also would like to thank Cláudio Diniz, Vera(More)
This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as(More)
Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events, little has been said on their cross-language behavior and on building an inventory of multilingual lexica of discourse markers. This work describes new methods and approaches for the(More)
  • 1