A theory of web traffic

@article{Simkin2008ATO,
  title={A theory of web traffic},
  author={Mikhail V. Simkin and Vwani P. Roychowdhury},
  journal={EPL},
  year={2008},
  volume={82},
  pages={28006}
}
We analyze access statistics of several popular webpages for a period of several years. The graphs of daily downloads are highly non-homogeneous with long periods of low activity interrupted by bursts of heavy traffic. These bursts are due to avalanches of blog entries, referring to the page. We quantitatively explain this behavior using the theory of branching processes. We extrapolate these findings to construct a model of the entire web. According to the model, the competition between… 

Figures from this paper

Human dynamics revealed through Web analytics
  • B. Gonçalves, J. Ramasco
  • Computer Science, Physics
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2008
TLDR
This work analyzes properly anonymized logs detailing the access history to Emory University's Web site and finds that linear preferential linking, priority-based queuing, and the decay of interest for the contents of the pages are the essential ingredients to understand the way users navigate the Web.
Why does attention to web articles fall with Time?
TLDR
It is argued that the decay of attention to a web article is caused by the link to it first dropping down the list of links on the website's front page and then disappearing from the front pages and its subsequent movement further into background.
Towards the Characterization of Individual Users through Web Analytics
TLDR
An analysis of the way individual users navigate in the Web indicates a rich variety of individual behaviors and seems to preclude the possibility of defining a characteristic frequency for each user in his/her visits to a single site.
Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model
TLDR
The experimental results with NASA web traffic data showed that the proposed advanced countermeasure against distributed web-crawlers was effective in identifying distributed crawlers with 0.0275%false positives when a conventional frequency-based detection method shows 2.882% false positives with an equal access threshold.
The Pulse of News in Social Media: Forecasting Popularity
TLDR
This paper constructs a multi-dimensional feature space derived from properties of an article and evaluates the efficacy of these features to serve as predictors of online popularity and demonstrates that despite randomness in human behavior, it is possible to predict ranges of popularity on twitter with an overall 84% accuracy.
Topological Structure and Interest Spectrum of the Group Interest Network
  • N. Zhang
  • Mathematics, Physics
    Complex
  • 2009
TLDR
The results indicate that the incoming degree distribution of the group interest network follows power law and the groupinterest spectrum was basically steady.
Theory of citing
TLDR
A stochastic model of the citation process is developed and shows that about 70–90% of scientific citations are copied from the lists of references used in other papers, which can explain not only why some misprints become popular, but also why some papers become highly cited.
Measuring Inter-site Engagement in a Network of Sites
TLDR
This work proposes a methodology for studying inter-site engagement by modeling websites and user traffic between them as a network and reduces the complexity of the data, and hence metrics can be efficiently employed to study user engagement within such networks.
Measuring inter-site engagement
TLDR
This paper investigates intersite engagement, that is, site engagement within a network of sites, by defining a global measure of engagement that captures the effect sites have on the engagement on other sites.

References

SHOWING 1-7 OF 7 REFERENCES
How nature works: The science of self-organized criticality
  • D. Raup
  • Computer Science
    Complex.
  • 1997
TLDR
His ruthless simplifications of geology, evolution, and neurology pay off because his models describe behavior that is common across these domains, and this universality means that trampling across others turf is not only acceptable, but almost mandatory, if the underlying principles are to be exposed.
Where do visitors come from? Resonance
  • 2005
Where do visitors come from? Resonance (2005) http://webkew.blogspot.com/2005/05/lesson-4- where-do-visitors-come-from.html
  • 2005
The Theory of Branching Processes
The theory of branching processes (Springer, Berlin
  • 1963
The State of the Live Web