Kondenzer: Exploration and visualization of archived social media

  title={Kondenzer: Exploration and visualization of archived social media},
  author={Omar Alonso and Kartikay Khandelwal},
  journal={2014 IEEE 30th International Conference on Data Engineering},
Modern social networks such as Twitter provide a platform for people to express their opinions on a variety of topics ranging from personal to global. While the factual part of this information and the opinions of various experts are archived by sources such as Wikipedia and reputable news articles, the opinion of the general public is drowned out in a sea of noise and “un-interesting” information. In this demo we present Kondenzer - an offline system for condensing, archiving and visualizing… 

Figures and Tables from this paper

The World Conversation: Web Page Metadata Generation From Social Sources

This paper presents a technique called social signatures that given a link to a web page, pulls the most important keywords from the social chatter around it, a high level representation of the web page from a social media perspective.

Cashtag Piggybacking

A malicious practice—referred to as cashtag piggybacking—perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones is uncovered.

Populating knowledge bases with temporal information

Experimental evaluations demonstrate that the methods yield high quality output compared to state-ofthe-art approaches, and can indeed populate knowledge bases with temporal knowledge.

0 Cashtag piggybacking : uncovering spam and bot activity in stock microblogs on Twi

A malicious practice perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones is uncovered, calling for the adoption of spam and bot detection techniques in studies and applications that exploit user-generated content for predicting the stock market.

As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes

This paper develops a family of Integer Linear Programs for jointly inferring temponym mappings to the timeline and knowledge base and develops methods for detecting such temponyms, inferring their temporal scopes, and mapping them to events in a knowledge base if present there.



Interactive visualization of emerging topics in multiple social media streams

An interactive news flow visualization that reveals emerging topics in dynamic digital content archives with a particular emphasis on visual metaphors to highlight hidden relations in digital content is introduced.

Are Some Tweets More Interesting Than Others? #HardQuestion

Crowdourcing was used to assemble a set of tweets rated as interesting or not; these tweets were scored using textual and contextual features; and these scores were used as inputs to a binary classifier, which was able to achieve moderate agreement between the best classifier and the human assessments.

Faceted Search

This lecture explores the history, theory, and practice of faceted search, and offers a self-contained treatment of the topic, with an extensive bibliography for those who would like to pursue particular aspects in more depth.

Web document clustering: a feasibility demonstration

To satisfy the stringent requirements of the Web domain, an incremental, linear time algorithm called Suffix Tree Clustering (STC) is introduced which creates clusters based on phrases shared between documents, showing that STC is faster than standard clustering methods in this domain.

Synthesis Lectures on Information Concepts, Retrieval, and Services

This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN’s), and divergence-based models to create a consolidated and balanced view on the main models.

Analysis of lexical signatures for improving information persistence on the World Wide Web

A dynamic LS generator called Test & Select (TS) is proposed to mitigate LS conflict, which outperforms all eight static methods in terms of both extracting the desired document and finding relevant information, over three different search engines.