Federico Nanni

Learn More
Tracking global events through time would ease many diachronic analyses which are currently carried out manually by social scientists and humanities scholars. While entity linking algorithms can be adapted to identify mentions of an event that goes by a common name, such name is often not established in early stages leading up to the event. This study(More)
Retrieving paragraphs to populate a Wikipedia article is a challenging task. The new TREC Complex Answer Retrieval (TREC CAR) track introduces a comprehensive dataset that targets this retrieval scenario. We present early results from a variety of approaches -- from standard information retrieval methods (e.g., tf-idf) to complex systems that using query(More)
Segmenting text into semantically coherent fragments improves readability of text and facilitates tasks like text summariza-tion and passage retrieval. In this paper , we present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic(More)
Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence(More)
Introduction Humanities scholars have experimented with the potential of different text mining techniques for exploring large corpora, from co­occurrence­based methods to sequence­labeling algorithms (e.g. Named entity recognition). LDA topic modeling (Blei et al., 2003) has become one of the most employed approaches (Meeks and Weingart, 2012). Scholars(More)
  • 1