The Parable of Google Flu: Traps in Big Data Analysis

  title={The Parable of Google Flu: Traps in Big Data Analysis},
  author={David Lazer and Ryan Kennedy and Gary King and Alessandro Vespignani},
  pages={1203 - 1205}
Large errors in flu prediction were largely avoidable, which offers lessons for the use of big data. In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from… 

The life and death of Google Flu Trends

The story of GFT is offered, which offers insight into contemporary tensions between the indomitable intensity of collective life and stubborn attempts at its algorithmic formalization, as well as some of the conceptual and practical challenges raised by the online algorithmic tracking of disease.

Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions

An empirical network using CDC data and GFT data is constructed and an improved model is constructed that predicts infections one week into the future as well as GFT predicts the present and does particularly well in regions that are most likely to facilitate influenza spread and during epidemics.

Reappraising the utility of Google Flu Trends

An error profile of GFT in the US is provided, strong evidence for the adoption of search trends based 'nowcasts' in influenza forecast systems is established, and reevaluation of the utility of this data source in diverse domains is encouraged.

Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data

  • Batuhan BardakMehmet Tan
  • Computer Science, Medicine
    2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE)
  • 2015
A new technique to use these two sources together to improve the prediction of influenza outbreaks is proposed and promising results for both nowcasting and forecasting with linear regression models are achieved.

What can digital disease detection learn from (an external revision to) Google Flu Trends?

Big data diagnostics.

In looking at GFT trends, Lazer and colleagues write that since well before 2013, “GFT had been persistently overestimating flu prevalence,” and published their answer to that question in the journal Science.

Flu Detector: Estimating influenza-like illness rates from online user-generated content

Notably, the models based on Google data achieve a high level of accuracy with respect to the most recent four flu seasons in England and were highlighted as having a great potential of becoming a complementary source to the domestic traditional flu surveillance schemes.

Searching for the Peak Google Trends and the COVID-19 Outbreak in Italy

One of the difficulties faced by policy makers during the COVID-19 outbreak in Italy was the monitoring of the virus diffusion. Due to changes in the criteria and insufficient resources to test all

ARGO: a model for accurate estimation of influenza epidemics using Google search data

A new influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data and incorporates the seasonality in influenza epidemics, but also captures changes in people’s online search behavior over time.

Use of daily Internet search query data improves real-time projections of influenza epidemics

This study combines a previously developed calibration and prediction framework with an established humidity-based transmission dynamic model to forecast influenza and finds that both the earlier availability and the finer temporal resolution are important for increasing forecasting performance.



Detecting influenza epidemics using search engine query data

A method of analysing large numbers of Google search queries to track influenza-like illness in a population and accurately estimate the current level of weekly influenza activity in each region of the United States with a reporting lag of about one day is presented.

Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic

Both GFT models performed well prior to and during pH1N1, although the updated model performed better during pH0.1, especially during the summer months, and changes in health-seeking behavior may have played a part.

Monitoring Influenza Activity in the United States: A Comparison of Traditional Surveillance Systems with Google Flu Trends

It is demonstrated that while Google Flu Trends is highly correlated with rates of ILI, it has a lower correlation with surveillance for laboratory-confirmed influenza.

Flu Near You: An Online Self-reported Influenza Surveillance System in the USA

It was found that it is possible to engage users in a symptom self-reporting system and augment information about influenza for the nation, and increased uptake would increase the value of the system for the public and public health professionals.


The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and other scholars are clamoring for access to the

Beating the news using social media: the case study of American Idol

Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators that may be able to anticipate the future unfolding of opinion formation events.

Epidemiology of seasonal influenza: use of surveillance data and statistical models to estimate the burden of disease.

Improvements in national influenza surveillance systems will be needed to collect and analyze data in a timely manner during the next pandemic.

Forecasting seasonal outbreaks of influenza

It is indicated that real-time skillful predictions of peak timing can be made more than 7 wk in advance of the actual peak, and confidence in those predictions can be inferred from the spread of the forecast ensemble.

Real-time epidemic forecasting for pandemic influenza

Using epidemiological data collected during the early stages of an outbreak, it is shown how the timing of the maximum prevalence of the pandemic wave, along with its amplitude and duration, might be predicted by fitting a mass-action epidemic model to the surveillance data by standard regression analysis.

Real-Time Epidemic Monitoring and Forecasting of H1N1-2009 Using Influenza-Like Illness from General Practice and Family Doctor Clinics in Singapore

The real-time surveillance system is able to show the progress of an epidemic and indicates when the peak is reached and can be used to form forecasts, including how soon the epidemic wave will end and when a second wave will appear if at all.