Federico Nanni

Learn More
From 1981 to 1991, 37,666 human, animal, food and environmental samples were cultured for Yersinia pseudotuberculosis using direct plating methods and/or cold enhancement techniques. Despite an intensive surveillance and adequate culture methods, Y. pseudotuberculosis was isolated from stools of 0.05% (5/9,720) of patients with acute enteritis, and(More)
Segmenting text into semantically coherent fragments improves readability of text and facilitates tasks like text summariza-tion and passage retrieval. In this paper , we present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic(More)
General political topics, like social security and foreign affairs, recur in electoral manifestos across countries. The Comparative Manifesto Project collects and manually codes manifestos of political parties from all around the world, detecting political topics at sentence level. Since manual coding is time-consuming and allows for annotation(More)
Tracking global events through time would ease many diachronic analyses which are currently carried out manually by social scientists and humanities scholars. While entity linking algorithms can be adapted to identify mentions of an event that goes by a common name, such name is often not established in early stages leading up to the event. This study(More)
Introduction Humanities scholars have experimented with the potential of different text mining techniques for exploring large corpora, from co­occurrence­based methods to sequence­labeling algorithms (e.g. Named entity recognition). LDA topic modeling (Blei et al., 2003) has become one of the most employed approaches (Meeks and Weingart, 2012). Scholars(More)
Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence(More)
This article is focused on the complexity of finding and analyzing the totality of educational information shared by the University of Bologna on its website during the last twenty years. It specifically em phasizes some issues related to the use of the Wayback Machine, the most important international web archive, and the need for a different research tool(More)
Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related(More)