• Publications
  • Influence
Jointly Extracting and Compressing Documents with Summary State Representations
TLDR
A new neural model for text summarization that first extracts sentences from a document and then compresses them, improving over current extractive and abstractive methods is presented. Expand
Multilingual Clustering of Streaming News
TLDR
This work describes a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual clusters, and produces state-of-the-art results on datasets in German, English and Spanish. Expand
Automated Fact Checking in the News Room
TLDR
An automated fact checking platform which given a claim, it retrieves relevant textual evidence from a document collection, predicts whether each piece of evidence supports or refutes the claim, and returns a final verdict. Expand
SUMMA at TAC Knowledge Base Population Task 2017
TLDR
The SUMMA system used an entity recognition based on a LSTM+CRF neural network and two different approaches for entity linking disambiguation: a nearest-neighbors search engine and a distributed representation based on previous work of Yamada et al. (2017). Expand
The SUMMA Platform: A Scalable Infrastructure for Multi-lingual Multi-media Monitoring
The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a fewExpand
The SUMMA Platform Prototype
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural languageExpand
Hierarchical Nested Named Entity Recognition
TLDR
A transition-based parser is built that explicitly models an arbitrary number of hierarchical and nested mentions, and a set of modifier classes are proposed which introduces certain concepts that change the meaning of an entity. Expand
Priberam at MESINESP Multi-label Classification of Medical Texts Task
TLDR
This work addresses the large multi-label classification problem through the use of four different models: a Support Vector Machine (SVM), a customised search engine, a BERT based classifier, and a SVM-rank ensemble of all the previous models, demonstrating that all three individual models perform well and the best performance is achieved by their ensemble. Expand
Tailoring Media Monitoring with User Feedback
TLDR
This talk will discuss how to empower users with relevant and personalized content in the context of the media monitoring setting and introduce the approach Priberam is taking to the problems at hand; in particular by training text retrieval models on-the-fly from user feedback and integrating them in a media monitoring workflow. Expand