• Publications
  • Influence
SciPy 1.0: fundamental algorithms for scientific computing in Python
TLDR
An overview of the capabilities and development practices of SciPy 1.0 is provided and some recent technical developments are highlighted.
The webgraph framework I: compression techniques
TLDR
This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks
TLDR
Experiments performed show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks.
UbiCrawler: a scalable fully distributed Web crawler
TLDR
The main features of UbiCrawler are platform independence, linear scalability, graceful degradation in the presence of faults, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task.
Four degrees of separation
TLDR
The first world-scale social-network graph-distance computations, using the entire Facebook network of active users, and the average distance is 4:74, corresponding to 3:74 intermediaries or "degrees of separation", prompting the title of this paper.
The query-flow graph: model and applications
TLDR
This paper introduces the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior, and proposes a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users.
A reference collection for web spam
TLDR
This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.
Axioms for Centrality
TLDR
This study tries to provide a mathematically sound survey of the most important classic centrality measures known from the literature and proposes an axiomatic approach to establish whether they are actually doing what they have been designed to do, and suggests that centrality Measures based on distances, which in recent years have been neglected in information retrieval, do provide high-quality signals.
Effective and Efficient Entity Search in RDF Data
TLDR
An adaptation of the BM25F ranking function for RDF data is described, and it is demonstrated that it outperforms other state-of-the-art methods in ranking RDF resources and set of new index structures for efficient retrieval and ranking of results.
Query suggestions using query-flow graphs
TLDR
The proposed methods can match in precision, and often improve, recommendations based on query-click graphs, without using users' clicks, and the experiments show that it is important to consider transition-type labels on edges for having good quality recommendations.
...
...