Luís Marujo

Learn More
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the(More)
Fast and effective automated indexing is critical for search and personalized services. Key phrases that consist of one or more words and represent the main concepts of the document are often used for the purpose of indexing. In this paper, we investigate the use of additional semantic features and pre-processing steps to improve automatic key phrase(More)
This paper describes an enhanced automatic keyphrase extraction method applied to Broadcast News. The keyphrase extraction process is used to create a concept level for each news. On top of words resulting from a speech recognition system output and news indexation and it contributes to the generation of a tag/keyphrase cloud of the top news included in a(More)
In general, centrality-based retrieval models treat all elements of the retrieval space equally, which may reduce their effectiveness. In the specific context of extractive summarization (or important passage retrieval), this means that these models do not take into account that information sources often contain lateral issues, which are hardly as important(More)
High-quality parallel data is crucial for a range of multilingual applications, from tuning and evaluating machine translation systems to cross-lingual annotation projection. Unfortunately, automatically obtained parallel data (which is available in relative abundance) tends to be quite noisy. To obtain high-quality parallel data, we introduce a(More)
In this paper we present a method to improve the automatic detection of events in short sentences when in the presence of a large number of event classes. Contrary to standard classification techniques such as Support Vector Machines or Random Forest, the proposed Fuzzy Fingerprints method is able to detect all the event classes present in the ACE 2005(More)
We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well–known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries(More)
This paper describes the early stages of porting REAP, a tutoring system for vocabulary learning, to European Portuguese. Students learn from authentic materials, on topics of their preference. A large number of linguistic resources and filtering tools have already been integrated into the ported version. We modified the current system to also target oral(More)
The process of predicting news stories popularity from several news sources has become a challenge of great importance for both news producers and readers. In this paper, we investigate methods for automatically predicting the number of clicks on a news story during one hour. Our approach is a combination of additive regression and bagging applied over a(More)
This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly(More)