The Workshops of the Tenth International AAAI Conference on Web and Social Media

Abstract

In this communication we take advantage of the global covering character of Wikipedia dataset to analyze the dependence of the usual coefficients used to measure burstiness respect to language. Analyzing separately the patterns for single editors over several pages, we show several characteristics of the super-editors in the WP written in English, Spanish, French and Portuguese. We report for the first time the Burstiness and Memory effect coefficients, separately for the 4 WP’s, showing similitudes and differences for all the users respect to the super-editors, the exponent for their averaged interevent activity and finally some statistical traces for their averaged monthly activity. The digital media are an important component of our lives. Nowadays, digital records of human activity of different sorts are systematically stored and made accessible for academic research. Hence a huge amount of data became available on the past couple of decades, which allows for a quantitative study of human behaviour, opening progressively, the possibility to uncover some social patterns not detected so far (Barrat, Barthélemy, and Vespignani. 2008; Newman, Barabasi, and Watts. 2006). The success of research in digital social patterns hinges on the access to high quality data. Even though the availability of recorded data and its accessibility are rapidly increasing, many data sets are not freely available for research. Wikipedia (WP) is an important exception, as not only is it considered a robust and trustworthy source of information (Giles 2005), but it is also easily accessible via the API (https://www.mediawiki.org/wiki/api:main page) or the different available dumps (http://wwm.phy.bme.hu/) by anyone with connection to internet. In this communication we take advantage of the global covering character of Wikipedia (WP) dataset to answer the question: which of the usually used measurements for burstiness are global or have local dependence, in this case constrained to the language. Human bursty behaviour, is the mankind activity characterized by intervals of rapidly occurring events separated by long periods of inactivity (Barabási 2005). This phenomenon has been found to modulate several kind of human activities, such as sending letters, writing Copyright c ⃝ 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. email messages, sending mobile SMS, making phone calls and web browsing, among others (Vázquez et al. 2006; Goh. and Barabási 2008; Wu et al. 2010; Malmgren et al. 2008; 2009; Ratkiewicz et al. 2010; Jo et al. 2012). The main characteristics of a bursty behaviour is a powerlaw distribution of the inter-event activity, i.e, the interval of time between consecutive actions or events. The exponent of the power-law distributions have been reported as closely distributed around an universal value, which takes values of 1 in Web browsing, email, and library datasets, while 3/2 for mail correspondence patterns (Vázquez et al. 2006). Under the premise of queuing process -when individuals execute tasks based on some perceived priorityas the origin of human burstiness, the change of the exponent value was suggested to depends on if there are or not limitations on the number of tasks an individual can handle in a finite time (Vázquez et al. 2006). For the case of Wikipedia (WP), the exponent for the averaged inter-event distribution over a sample of the 100 most active editors has been reported as 1.44 (Yasseri and Kertész 2013). Another parameter based on the broad distribution of the inter-events, comparing the variance respect to the mean for the inter-events, has been defined as B parameter by Goh et al. (Goh. and Barabási 2008). In the same work the authors also defined the memory coefficient, M , to measure the probability to have short (large) inter-events followed by short (large) ones. In this work we report for the first time both values for Wikipedia data-set, showing similitudes and differences respect to the super-editors in the edition of WP written in 4 different languages. With this picture in mind, next we show the probability distribution function of the inter-events (in seconds) averaged over such super-editors. Then we show the averaged monthly activity and finally the averaged cumulative monthly activity for all of them, separately in the 4 studied WP. Our data sample for the WP editors consist of the four separated WP dumps (http://wwm.phy.bme.hu/): The one written in English (EN-WP), the Spanish one (ES-WP), the French WP (FR-WP) and the Portuguese one (PT-WP). All of them in the period of about 10 years ending in January 2010. The accessible data contain the whole editing history record for both pages and editors. For each entry the “light dump”has the WP page name, the edit time stamp and the identification of the editor who did the changes. We disThe Workshops of the Tenth International AAAI Conference on Web and Social Media Wiki: Technical Report WS-16-17

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Gandica2016TheWO, title={The Workshops of the Tenth International AAAI Conference on Web and Social Media}, author={Y{\'e}rali Gandica and Renaud Lambiotte and Timoteo Carletti}, year={2016} }