Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus

@article{Mehdi2017ExcavatingTM,
  title={Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus},
  author={Mohamad Mehdi and Chitu Okoli and Mostafa Mesgari and Finn {\AA}rup Nielsen and Arto Lanam{\"a}ki},
  journal={Inf. Process. Manag.},
  year={2017},
  volume={53},
  pages={505-529}
}
Use of Wikipedia categories on information retrieval research: a brief review
TLDR
This paper adopts a systematic literature review approach, in order to identify different approaches and uses of Wikipedia categories in information retrieval research, and shows that in many cases research approaches applied and results obtained can be integrated into a comprehensive and inclusive concept of information retrieval.
Title Computing controversy : Formal model and algorithms fordetecting controversy on Wikipedia and in search queries
TLDR
A formal model of controversy is introduced as the basis of computational approaches to detecting controversial concepts and a classification based method for automatic detection of controversial articles and categories in Wikipedia is proposed.
Testing the validity of Wikipedia categories for subject matter labelling of open-domain corpus data
The Wikipedia category system was designed to enable browsing and navigation of Wikipedia. It is also a useful resource for knowledge organisation and document indexing, especially using automatic ...
Open semantic analysis: The case of word level semantics in Danish
TLDR
Data-driven models for Danish semantic relatedness, word intrusion and sentiment prediction are described and it is found that logistic regression and large random forests perform well with semantic representations.
Evaluation of Naive Bayes and Support Vector Machines for Wikipedia
TLDR
This work compares and illustrates the effectiveness of two standard classifiers in the text classification literature, Naive Bayes and Support Vector Machines, on the full English Wikipedia corpus for six different categories, and shows that SVM (linear kernel) performs exceptionally across all categories.
Wikipedia categories in research: towards a qualitative review of uses and applications
TLDR
It is concluded that the Wikipedia system of categories offers a valid classification scheme for the different approaches taken to study knowledge organization in multiple contexts.
...
1
2
3
...

References

SHOWING 1-10 OF 150 REFERENCES
Mining Meaning from Wikipedia
A knowledge-based search engine powered by wikipedia
TLDR
Koru is a new search interface that offers effective domain-independent knowledge-based information retrieval that exhibits an understanding of the topics of both queries and documents, and is capable of lending assistance to almost every query issued to it.
Wikipedia-based Semantic Interpretation for Natural Language Processing
TLDR
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
TLDR
This paper presents two application programming interfaces for Wikipedia and Wiktionary which are especially designed for mining the rich lexical semantic information dispersed in the knowledge bases, and provide efficient and structured access to the available knowledge.
Learning for information extraction: from named entity recognition and disambiguation to relation extraction
TLDR
This research uses Wikipedia as a repository of named entities and proposes a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy.
Wikitology: a novel hybrid knowledge base derived from wikipedia
TLDR
The value of the derived knowledge base is demonstrated by developing problem specific intelligent approaches that exploit Wikitology for a diverse set of use cases, namely, document concept prediction, cross document co-reference resolution, Entity Linking to KB entities defined as a part of Text Analysis Conference - Knowledge Base Population Track 2009 and interpreting tables.
Semantic Wikipedia
TLDR
This paper provides an extension to be integrated in Wikipedia, that allows the typing of links between articles and the specification of typed data inside the articles in an easy-to-use manner, and presents the design, implementation, and possible uses of this extension.
Expert-Built and Collaboratively Constructed Lexical Semantic Resources
TLDR
A comprehensive overview of the lexical semantic knowledge therein is provided and a review of work on orchestrating different resources in order to combine their strengths and explore their use in major NLP applications is reviewed.
Learning to link with wikipedia
TLDR
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%.
Using Wikipedia knowledge to improve text classification
TLDR
Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.
...
1
2
3
4
5
...