• Publications
  • Influence
Finding high-quality content in social media
TLDR
This paper introduces a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition, and shows that its system is able to separate high-quality items from the rest with an accuracy close to that of humans. Expand
The query-flow graph: model and applications
TLDR
This paper introduces the query-flow graph, a graph representation of the interesting knowledge about latent querying behavior, and proposes a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Expand
Know your neighbors: web spam detection using the web topology
TLDR
A spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages, which finds that linked hosts tend to belong to the same class. Expand
Fast shortest path distance estimation in large networks
TLDR
This paper proves that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed, and explores theoretical insights to devise a variety of simple methods that scale well in very large networks. Expand
A reference collection for web spam
TLDR
This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges. Expand
Online team formation in social networks
TLDR
This paper proposes efficient algorithms that address all requirements of online team formation: these algorithms form teams that always satisfy the required skills, provide approximation guarantees with respect to team communication overhead, and they are online-competitive with Respect to load balancing. Expand
Efficient semi-streaming algorithms for local triangle counting in massive graphs
TLDR
This is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs and proposes two approximation algorithms, which are based on the idea of min-wise independent permutations. Expand
Effective web crawling
TLDR
The World Wide Web is a context in which traditional Information Retrieval methods are challenged, and given the volume of the Web and its speed of change, the coverage of modern search engines is relatively small. Expand
Query suggestions using query-flow graphs
TLDR
The proposed methods can match in precision, and often improve, recommendations based on query-click graphs, without using users' clicks, and the experiments show that it is important to consider transition-type labels on edges for having good quality recommendations. Expand
From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns
TLDR
An accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92\% accuracy is built and it is demonstrated that the reformulation classifier leads to improved recommendations in a query recommendation system. Expand
...
1
2
3
4
5
...