The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

@article{Hecht2010TheTO,
  title={The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context},
  author={Brent J. Hecht and Darren Gergle},
  journal={Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
  year={2010}
}
  • Brent J. Hecht, D. Gergle
  • Published 10 April 2010
  • Computer Science
  • Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of… 

Figures and Tables from this paper

Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks
TLDR
The article develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias, and goes beyond existing approaches by examining their structural and semantic aspects intra-and intertextually.
Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders
TLDR
This work investigates supporting communities by generating summaries for Wikipedia articles in underserved languages by giving structured data as an input.
The_Tower_of_Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions
TLDR
The diversity of visual encyclopedic knowledge across 25 language editions of Wikipedia is assessed and the similarities and differences in visual encyclopedia knowledge across language editions are measured.
Analysis of Editors' Languages in Wikidata
TLDR
This paper investigates the language distribution in Wikidata's editors, how it relates to WikidATA's content and the users' label editing, which gives an insight into its community that can help supporting users working on multilingual projects.
The mining and application of diverse cultural perspectives in user-generated content
TLDR
It is demonstrated that UGC reflects the cultural diversity of its contributors to a previously unidentified extent, and that this diversity has important implications for Web users and existing UGC-based technologies.
Cross-language Wikipedia Editing of Okinawa, Japan
This article analyzes users who edit Wikipedia articles about Okinawa, Japan, in English and Japanese. It finds these users are among the most active and dedicated users in their primary languages,
Omnipedia: bridging the wikipedia language gap
TLDR
A study of Omnipedia that characterizes how people interact with information using a multilingual lens found that users actively sought information exclusive to unfamiliar language editions and strategically compared how language editions defined concepts.
Examining Wikipedia across linguistic and temporal borders
TLDR
A framework to enable temporal page views in Wikipedia to be associated with specific geographic profiles is provided and used to examine the exchange of information between the English speaking and Chinese speaking localities and initial findings on the role of language and culture in diffusion are reported.
Cultural Configuration of Wikipedia: measuring Autoreferentiality in Different Languages
TLDR
A method is outlined which allows collecting the articles of this content, to later analyse them in several dimensions and obtains an index which represents the degree of autoreferentiality of the encyclopedia.
Wikipedia Beyond the English Language Edition
TLDR
The findings show that the same power plays used in EN exist in both FA and ZH but the frequency of their usage differs across the editions, suggesting that editors in different language communities value contrasting types of policies to compete for power while discussing and editing articles.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Enriching the crosslingual link structure of Wikipedia - A classification-based approach
TLDR
This paper presents a classification-based approach with the goal of inferring new cross-language links in Wikipedia and shows that this approach has a recall of 70% with a precision of 94% for the task of learning cross- language links on a test dataset.
Learning to link with wikipedia
TLDR
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%.
Information arbitrage across multi-lingual Wikipedia
TLDR
Analyzing four large language domains (English, Spanish, French, and German), this work presents Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new inf oboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages.
Cultural bias in Wikipedia content on famous persons
TLDR
The extent to which content and perspectives vary across cultures is examined by comparing articles about famous persons in the Polish and English editions of Wikipedia, revealing systematic differences related to the different cultures, histories, and values of Poland and the United States.
Intelligence in Wikipedia
The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage
Wikipedia-based Semantic Interpretation for Natural Language Processing
TLDR
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.
Finding Similar Sentences across Multiple Languages in Wikipedia
TLDR
Whether the Wikipedia corpus is amenable to multilingual analysis that aims at generating parallel corpora is investigated and two simple heuristics for the identification of similar text across multiple languages in Wikipedia are presented.
Power of the Few vs . Wisdom of the Crowd : Wikipedia and the Rise of the Bourgeoisie
TLDR
Although Wikipedia was driven by the influence of “elite” users early on, more recently there has been a dramatic shift in workload to the “common” user, and this is shown in del.icio.us, a very different type of social collaborative knowledge system.
Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge
TLDR
This paper introduces a method that relies on the information extracted from Wikipedia, by exploiting the interlanguage links available between Wikipedia versions in multiple languages, that performs well, with a performance comparable to monolingual measures of relatedness.
Mopping up: modeling wikipedia promotion decisions
This paper presents a model of the behavior of candidates for promotion to administrator status in Wikipedia. It uses a policy capture framework to highlight similarities and differences in the
...
1
2
3
4
...