• Publications
  • Influence
The power of both choices: Practical load balancing for distributed stream processing engines
TLDR
We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical “power of two choices” to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. Expand
  • 116
  • 17
  • PDF
Document Similarity Self-Join with MapReduce
TLDR
We present SSJ-2R, a MapReduce based algorithm for the Sim-SJ problem that is 4.5x faster than the state of the art. Expand
  • 75
  • 12
  • PDF
Quantifying Controversy on Social Media
TLDR
We perform a systematic methodological study of controversy detection by using the content and the network structure of social media. Expand
  • 148
  • 11
  • PDF
SAMOA: scalable advanced massive online analysis
TLDR
SAMOA (SCALABLE ADVANCED MASSIVE ONLINE ANALYSIS) is a platform for mining big data streams. Expand
  • 147
  • 9
  • PDF
Quantifying Controversy in Social Media
TLDR
We perform a systematic methodological study of controversy detection using social media network structure and content. Expand
  • 54
  • 8
  • PDF
Efficient Online Evaluation of Big Data Stream Classifiers
TLDR
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better- performing models. Expand
  • 92
  • 7
  • PDF
SAMOA: a platform for mining big data streams
  • G. Morales
  • Computer Science
  • WWW '13 Companion
  • 13 May 2013
TLDR
Social media and user generated content are causing an ever growing data deluge. Expand
  • 85
  • 7
  • PDF
Political Discourse on Social Media: Echo Chambers, Gatekeepers, and the Price of Bipartisanship
TLDR
We define a production and consumption measure for social-media users, which captures the political leaning of the content shared and received by them. Expand
  • 85
  • 6
  • PDF
Reducing Controversy by Connecting Opposing Views
TLDR
We propose a simple model based on a recently-developed user-level controversy score, that is competitive with state-of-the-art link-prediction algorithms. Expand
  • 83
  • 6
  • PDF
When two choices are not enough: Balancing at scale in Distributed Stream Processing
TLDR
We propose a novel load balancing technique that uses a heavy hitter algorithm to efficiently identify the hottest keys in the stream. Expand
  • 64
  • 6
  • PDF