• Corpus ID: 202666436

Measuring Wikipedia

  title={Measuring Wikipedia},
  author={Jakob Vo{\ss}},
Wikipedia, an international project that uses Wiki software to collaboratively create an encyclopaedia, is becoming more and more popular. Everyone can directly edit articles and every edit is recorded. The version history of all articles is freely available and allows a multitude of examinations. This paper gives an overview on Wikipedia research. Wikipedia’s fundamental components, i.e. articles, authors, edits, and links, as well as content and quality are analysed. Possibilities of research… 

Figures and Tables from this paper

Information Uniqueness in Wikipedia Articles
A method is described that captures the information duplication across the article contents in an attempt to infer the amount of distinct information every article communicates, based on the intuition that an article offering unique information about its subject is of better quality compared to an article that discusses issues already addressed in several other articles.
The Top-Ten Wikipedias - A Quantitative Analysis Using Wikixray
Using a quantitative methodology based on the analysis of the public Wikipedia databases, the main characteristics of the 10 largest language editions, and the authors that work in them are described, and some relationships between contribution patterns and content are explained.
The Wikipedia Corpus
This paper examines Wikipedia from a research perspective, providing basic background knowledge and an understanding of its strengths and weaknesses, and solves a technical challenge posed by the enormity of text made available with a simple, easily-implemented dictionary compression algorithm.
Customer relationship management practices in the online community - Wikipedia
Wikipedia is multilingual and the largest open content website which depends on its users to produce information of value. Wikis are often used to create and power collaborative and community
Visitors and Contributors in Wikipedia
This study aims to find patterns that can be found how users interact and behave when visiting the Wikipedia’s pages, based on a sample of the requests that users submit to Wikipedia, which the authors receive in the form of log lines.
Identifying Controversies in Wikipedia using Support Vector Machines
Wikipedia is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the
Robust clustering of languages across Wikipedia growth
It is shown that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns, and the make-up of these clusters is remarkably robust against the method used for their determination, as it is verified via four different clustering methods.
Measuring the wikisphere
WikiCrawler, a tool that automatically downloads and analyzes wikis, and studied 151 popular wikis running Mediawiki found that they displayed signs of collaborative authorship, validating them as objects of study and suggesting that most wikis accumulate edits through a similar underlying mechanism, which could motivate a model of user activity that is applicable to wikis in general.
A Multi-view Approach for the Quality Assessment of Wiki Articles
This work proposed to group the indicators in semantically meaningful views of quality and investigated a new approach to combine these views based on a meta-learning method, known as stacking, and demonstrated that it is possible to use this approach in collaborative encyclopedias such as Wikipedia and Wikia.
Patterns of creation and usage of Wikipedia content
Using the category of Wikipedia as macro-agglomerates, this study reveals that Wikipedia categories face a decreasing growth trend over time, after an initial, exponential phase of development and demonstrates that the number of views to the pages within the categories follow a linear, unbounded growth.


Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource
  • A. Lih
  • Computer Science, Political Science
  • 2004
This study examines the growth of Wikipedia and analyzes the crucial technologies and community policies that have enabled the project to prosper, and establishes a set of metrics based on established encyclopedia taxonomies and analyzed the trends in Wikipedia being used as a source.
Bibliothek, Information und Dokumentation in der Wikipedia
Library, Information and Documentation in Wikipedia Wikipedia is an international project to create a free encyclopaedia in multiple languages. Using a wiki thousands of volunteers are
Folksonomies-Cooperative Classification and Communication Through Shared Metadata
This paper examines user-generated metadata as implemented and applied in two web services designed to share and organize digital media to better understand grassroots classification. Metadata data
Wikipedia and the Disappearing "Author"
WHAT DOES it mean to author a piece of writing? For many generations, humans inscribed clay tablets and recorded information on papyrus but only rarely included their own names in the documents they
Studying cooperation and conflict between authors with history flow visualizations
This paper investigates the dynamics of Wikipedia, a prominent, thriving wiki, and focuses on the relevance of authorship, the value of community surveillance in ameliorating antisocial behavior, and how authors with competing perspectives negotiate their differences.
Phantom authority, self-selective recruitment and retention of members in virtual communities: The case of Wikipedia
An interpretative framework explains the outstanding success of Wikipedia thanks to a novel solution to the problem of graffiti attacks - the submission of undesirable pieces of information, which reduces the transaction cost of erasing graffiti and therefore prevents attackers from posting unwanted contributions.
Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming
This research applies Lotka's Law to metadata on open source software development. Lotka's Law predicts the proportion of authors at different levels of productivity. Open source software development
A general theory of bibliometric and other cumulative advantage processes
  • D. Price
  • Mathematics, Computer Science
    J. Am. Soc. Inf. Sci.
  • 1976
It is shown that such a stochastic law is governed by the Beta Function, containing only one free parameter, and this is approximated by a skew or hyperbolic distribution of the type that is widespread in bibliometrics and diverse social science phenomena.
The similarity metric
A new "normalized information distance" is proposed, based on the noncomputable notion of Kolmogorov complexity, and it is demonstrated that it is a metric and called the similarity metric.
Emergence of scaling in random networks
A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.