author={Ilias N. Flaounas and Omar Ali and Thomas Lansdall-Welfare and Tijl De Bie and Nick Mosdell and Justin Lewis and Nello Cristianini},
  journal={Digital Journalism},
  pages={102 - 116}
News content analysis is usually preceded by a labour-intensive coding phase, where experts extract key information from news items. The cost of this phase imposes limitations on the sample sizes that can be processed, and therefore to the kind of questions that can be addressed. In this paper we describe an approach that incorporates text-analysis technologies for the automation of some of these tasks, enabling us to analyse data sets that are many orders of magnitude larger than those… 

Big Data Analysis of News and Social Media Content

How the analysis of content on Twitter can reveal mood changes of entire populations, how the political relations among US leaders can be extracted from large corpora, how to determine what news people really want to read and some of the steps that can be automated are demonstrated, allowing researchers to access patterns that would be otherwise out of reach.

Big Data Analysis of News and Social Media Content

How the analysis of Twitter content can reveal mood changes in entire populations, how the political relations among US leaders can be extracted from large corpora, how to determine what news people really want to read, and how gender-bias and writing-style in articles change among different outlets are described.

Taking Stock of the Toolkit

A systematic inventory of recent applications of computational methods in journalism studies is presented, distinguishing between dictionary-based approaches, supervised machine learning, and unsupervised machine learning.

An Analysis of Subjectivity in Brazilian News

This paper proposes to use subjectivity lexicons to characterize subjectivity in five news portals that are popular in Brazil and performs a correlation analysis between the levels of subjectivity found and readability and news popularity metrics.

KazNewsDataset: Single Country Overall Digital Mass Media Publication Corpus

A corpus of Kazakhstan media is presented, which contains over 4 million publications from 36 primary sources (which has at least 500 publications), which will be interesting to researchers considering both multiple publications and mass media analysis, including comparative analysis and identification of common patterns inherent in the media of different countries.

Word Counts and Topic Models

It is shown that automated methods have different strengths that provide different opportunities, enriching—but not replacing—the range of manual content analysis methods.

Exploring Machine Learning to Study the Long-Term Transformation of News

It is shown how making classification processes transparent enables journalism scholars to employ these computational methods in a reliable and valid way, and could foster a revision of journalism history, particularly the often hypothesized but understudied shift from opinion-based to fact-centred reporting.

Content Analysis and Online News

It is argued that established content analysis is insufficient for digital media but that common standards, protocols and procedures are yet to be developed for these new approaches to digital journalism research.

Content analysis of 150 years of British periodicals

A vast corpus of regional newspapers from the United Kingdom is assembled, incorporating very fine-grained geographical and temporal information that is not available for books, and it is believed that these data-driven approaches can complement the traditional method of close reading in detecting trends of continuity and change in historical corpora.

Says who?: automatic text-based content analysis of television news

An automatic analysis of television news programs, based on the closed captions that accompany them, collects a series of key insights about news providers, people in the news, and the biases that can be uncovered by automatic means.



Automating Quantitative Narrative Analysis of News Data

We present a working system for large scale quantitative narrative analysis (QNA) of news corpora, which includes various recent ideas from text mining and pattern analysis in order to solve a

The Structure of the EU Mediasphere

This study reports what it thinks is the first large scale content-analysis of cross-linguistic text in the social sciences, by using various artificial intelligence techniques, and demonstrates the power of the available methods for significant automation of media content analysis.

Lydia: A System for Large-Scale News Analysis

The Lydia project seeks to build a relational model of people, places, and things through natural language processing of news sources and the statistical analysis of entity frequencies and co-locations.

NOAM: news outlets analysis and monitoring system

NoAM is the data management system behind various applications and scientific studies aiming at modelling the mediasphere, and combines a relational database with state of the art AI technologies, including data mining, machine learning and natural language processing.


This article explores the growth and character of breaking news on two 24-hour news channels in the United Kingdom, Sky News and BBC News 24. Our purpose is to examine, in detail, the nature and role

Immediacy, Convenience or Engagement? An analysis of 24-hour news channels in the UK

Abstract The article is based on the first systematic analysis of the output of 24-hour news channels in the UK. From a viewer's point of view, we argue, a 24-hour news channel can fulfil three main

RCV1: A New Benchmark Collection for Text Categorization Research

This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.

What Makes Us Click? - Modelling and Predicting the Appeal of News Articles

It is discovered that UK tabloids and the website of the “People” magazine contain more appealing content for all audiences than broadsheet newspapers, news aggregators and newswires, and that this measure of readers’ preferences correlates with a measure of linguistic subjectivity at the level of outlets.

Entropy of dialogues creates coherent structures in e-mail traffic.

The dynamic network of e-mail traffic is studied and it is found that it develops self-organized coherent structures similar to those appearing in many nonlinear dynamic systems.

Representation of Women in News and Photos: Comparing Content to Perceptions

This study uses a feminist framework of masculine cultural hegemony to examine the representation of women in two newspapers-a medium-sized newspaper (Study 1) and a larger newspaper (Study 2).