Predicting the volume of comments on online news stories

  title={Predicting the volume of comments on online news stories},
  author={Manos Tsagkias and Wouter Weerkamp and M. de Rijke},
  journal={Proceedings of the 18th ACM conference on Information and knowledge management},
On-line news agents provide commenting facilities for readers to express their views with regard to news stories. [] Key Method We address the prediction task as a two stage classification task: a binary classification identifies articles with the potential to receive comments, and a second binary classification receives the output from the first step to label articles "low" or "high" comment volume. The results show solid performance for the former task, while performance degrades for the latter.

Figures and Tables from this paper

News Comments: Exploring, Modeling, and Online Prediction
The log-normal and the negative binomial distributions for modeling comments from various news agents are compared, and the feasibility of online prediction of the number of comments, based on the volume observed shortly after publication, is examined.
A Supervised Method to Predict the Popularity of News Articles
This study proposes a machine learning approach to predict the volume of comments using the information that is extracted about the users’ activities on the web pages of news agencies and reveals salient improvement in comparison with the baseline methods.
Cannot Predict Comment Volume of a News Article before (a few) Users Read It
It is shown that the early arrival rate of comments is the best indicator of the eventual number of comments, and the relationship between the early rate and the final number ofComments as well as the prediction accuracy vary considerably across news outlets and news article categories.
Predicting News Popularity by Mining Online Discussions
It is shown that the proposed graph-based features capture the complexities of both these social interaction graphs and lead to improvements on the prediction of all popularity indicators in three online news post datasets and to significant improvement on the task of identifying controversial stories.
Emotional Influence Prediction of News Posts
The results show that terms is the most important feature and that features extracted from news posts’ content allow to effectively predict the amount of emotional reactions triggered by news articles.
Prediction for the Newsroom: Which Articles Will Get the Most Comments?
This work proposes to support manual moderation by proactively drawing the attention of moderators to article discussions that most likely need their intervention, and enrich the article with metadata, extract semantic and linguistic features, and exploit annotated data from a foreign language corpus.
Classification of German Newspaper Comments
This work proposes the following classification task: Given a news comment thread of a particular article, identify the newspaper it comes from and achieves precision of up to 90% for individual newspapers.
Predicting the Future Impact of News Events
A flexible framework that, given some definition of impact, can predict its future development at the beginning of the event is defined and experimentally identified the best features for each of them.
Emotional Reactions Prediction of News Posts
The results show that users’ early activity features are very important and that combining those features with terms can effectively predict the amount of emotional reactions triggered on users by a news post.
Towards better news article recommendation
A novel news recommendation system that enriches the description of news articles by latent aspects extracted from user comments, and deals with noisy comments by proposing a model for user comments ranking, and proposes a diversification model to remove redundancies and provide a wide coverage of aspects.


Extracting the discussion structure in comments on news-articles
It is shown how techniques from information retrieval, natural language processing and machine learning can be used to extract the 'reacts-on' relation between comments with high precision and recall.
The predictive power of online chatter
First, carefully hand-crafted queries produce matching postings whose volume predicts sales ranks, and even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.
Leave a Reply: An Analysis of Weblog Comments
A large-scale study of weblog comments and their relation to the posts is presented, using a sizable corpus of comments to estimate the overall volume of comments in the blogosphere; analyze the relation between the weblog popularity and commenting patterns in it; and measure the contribution of comment content to various aspects of weblogs access.
A Study of Blog Search
An analysis of a large blog search engine query log shows that blog searches have different intents than general web searches, suggesting that the primary targets of blog searchers are tracking references to named entities, and locating blogs by theme.
Capturing Global Mood Levels using Blog Posts
  • G. MishneM. de Rijke
  • Computer Science
    AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs
  • 2006
This paper builds models that predict the levels of various moods according to the language used by bloggers at a given time; these models show high correlation with the moods actually measured, and substantially outperform a baseline.
Can blog communication dynamics be correlated with stock market activity?
A simple model to study and analyze communication dynamics in the blogosphere and use these dynamics to determine interesting correlations with stock market movement is developed and results are promising yielding about 78% accuracy in predicting the magnitude of movement and 87% for the direction of movement.
Interactive Features of Online Newspapers: Identifying Patterns and Predicting Use of Engaged Readers
  • D. Chung
  • Business
    J. Comput. Mediat. Commun.
  • 2008
This study illustrates that news organizations need not worry about applying all types of interactive features to engage their readers as the features serve distinct functions and may focus on building credibility and may seek to identify their online news audiences and then subsequently provide interactive features accordingly.
Description and Prediction of Slashdot Activity
A statistical analysis of user's reaction time to a new discussion thread in online debates on the popular news site Slashdot shows that a mixture of two log-normal distributions combined with the circadian rhythm of the community is able to explain with surprising accuracy the reaction time of comments within a discussion thread.
Exploiting Surface Features for the Prediction of Podcast Preference
This work reports on work that uses easily extractable surface features of podcasts in order to achieve solid performance on two podcast preference prediction tasks: classification of preferred vs. non-preferred podcasts and ranking podcasts by level of preference.
Predicting the popularity of online content
Early patterns of Digg diggs and YouTube views reflect long-term user interest.