Experimenting with computational methods for large-scale studies of tracking technologies in web archives

  title={Experimenting with computational methods for large-scale studies of tracking technologies in web archives},
  author={Janne Nielsen},
  journal={Internet Histories},
  pages={293 - 315}
  • Janne Nielsen
  • Published 2 October 2019
  • Computer Science
  • Internet Histories
Abstract The use of tracking technologies to collect data about web users and their online behaviour has played an important role in the development of the web. Most studies of tracking examine the current extent of tracking on popular websites on the online web, while historical studies are rare. Large-scale historical studies of web tracking are important for a more comprehensive understanding of the development, spread and implications of tracking technologies across the web. Historical… 
Using mixed methods to study the historical use of web beacons in web tracking
The findings show the ratio of Danish to international third-party domains involved in the tracking and the development, over time, of what types of beacon providers are dominant on the Danish web.
Quantitative Approaches to the Danish Web Archive
This chapter includes examples of large-scale studies of different aspects of the Danish Web as it has been archived in Netarkivet and describes several approaches to creating and analysing large corpora using different types of archived sources for different purposes, such as metadata from crawl.
Digital humanities and web archives: Possible new paths for combining datasets
It is shown that it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated ‘link islands’.
Networks of power. Analysing the evolution of the Danish internet infrastructure
Abstract This article studies the evolution of the internet infrastructure and assesses emerging digital power structures and regulatory dynamics. We revisit and develop Thomas P. Hughes’ momentum


Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016
It is argued that an understanding of the ecosystem’s historical trends is important to any technical and policy discussions surrounding tracking and that third-party tracking on the web has increased in prevalence and complexity since the first third- party tracker was observed in 1996.
Tracking the Trackers: A Large-Scale Analysis of Embedded Web Trackers
It is found that trackers are widespread, and that very few trackers dominate the web (Google, Facebook and Twitter), except for a few countries such as China and Russia.
On the Ubiquity of Web Tracking: Insights from a Billion-Page Web Crawl
It is confirmed that trackers are widespread, and that a small number of trackers dominates the web (Google, Facebook and Twitter), and that Google still operates services on Chinese websites, despite its proclaimed retreat from the Chinese market.
The web is watching you: A comprehensive review of web-tracking techniques and countermeasures
The current techniques for web-tracking as well as techniques for its detection and analysis, and countermeasures to prevent web tracking are analyzed and discussed.
A Survey on Web Tracking: Mechanisms, Implications, and Defenses
This survey reviews the existing literature on the methods used by web services to track the users online as well as their purposes, implications, and possible user’s defenses, and presents five main groups of methods used for user tracking.
The Web Never Forgets: Persistent Tracking Mechanisms in the Wild
The evaluation of the defensive techniques used by privacy-aware users finds that there exist subtle pitfalls --- such as failing to clear state on multiple browsers at once - in which a single lapse in judgement can shatter privacy defenses.
Detecting and Defending Against Third-Party Tracking on the Web
This work develops a client-side method for detecting and classifying five kinds of third-party trackers based on how they manipulate browser state, and finds that no existing browser mechanisms prevent tracking by social media sites via widgets while still allowing those widgets to achieve their utility goals, which leads to a new defense.
Online Tracking: A 1-million-site Measurement and Analysis
The largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites, is presented, which demonstrates the OpenWPM platform's strength in enabling researchers to rapidly detect, quantify, and characterize emerging online tracking behaviors.
Web Tracking - A Literature Review on the State of Research
This paper provides an overview over the current state of the art of web-tracking research, aiming to reveal the relevance and methodologies of this research area and creates a foundation for future work.
Historical Website Ecology: Analyzing Past States of the Web Using Archived Source Code
A contextual approach to historical website analysis is taken by viewing the website as an environment that is inhabited and shaped by third parties such as social media platforms, advertisers, analytics companies and content-delivery networks, embedding the website in various technological and commercial relations with these actors.