Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus
@article{Mehdi2017ExcavatingTM, title={Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus}, author={Mohamad Mehdi and Chitu Okoli and Mostafa Mesgari and Finn {\AA}rup Nielsen and Arto Lanam{\"a}ki}, journal={Inf. Process. Manag.}, year={2017}, volume={53}, pages={505-529} }
22 Citations
Testing the validity of Wikipedia categories for subject matter labelling of open-domain corpus data
- Computer ScienceJournal of Information Science
- 2020
The results suggest that at least the taxonomy derived from the Wikipedia category system is not a valid instrument for manual subject matter labelling of open-domain text corpora.
Use of Wikipedia categories on information retrieval research: a brief review
- Computer ScienceCERI
- 2018
This paper adopts a systematic literature review approach, in order to identify different approaches and uses of Wikipedia categories in information retrieval research, and shows that in many cases research approaches applied and results obtained can be integrated into a comprehensive and inclusive concept of information retrieval.
How can wikipedia be used to support the process of automatically building multilingual domain modules? a case study
- Computer ScienceInf. Process. Manag.
- 2020
Exploring the Domain of Information “Users”: Semantic Analysis of Wikipedia Articles
- Computer Science
- 2020
The findings reveal that Wikipedia covers various topics of the information users domain, ranging from information search behavior, information retrieval, human-computer interaction, user experience, human factors, and to others.
How Wikipedia disease information evolve over time? An analysis of disease-based articles changes
- Computer ScienceInf. Process. Manag.
- 2020
Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries
- Computer ScienceInf. Process. Manag.
- 2018
Title Computing controversy : Formal model and algorithms fordetecting controversy on Wikipedia and in search queries
- Computer Science
- 2018
A formal model of controversy is introduced as the basis of computational approaches to detecting controversial concepts and a classification based method for automatic detection of controversial articles and categories in Wikipedia is proposed.
Computing semantic similarity based on novel models of semantic representation using Wikipedia
- Computer ScienceInf. Process. Manag.
- 2018
Open semantic analysis: The case of word level semantics in Danish
- Computer Science
- 2017
Data-driven models for Danish semantic relatedness, word intrusion and sentiment prediction are described and it is found that logistic regression and large random forests perform well with semantic representations.
Evaluation of Naive Bayes and Support Vector Machines for Wikipedia
- Computer ScienceAppl. Artif. Intell.
- 2017
This work compares and illustrates the effectiveness of two standard classifiers in the text classification literature, Naive Bayes and Support Vector Machines, on the full English Wikipedia corpus for six different categories, and shows that SVM (linear kernel) performs exceptionally across all categories.
References
SHOWING 1-10 OF 150 REFERENCES
A knowledge-based search engine powered by wikipedia
- Computer ScienceCIKM '07
- 2007
Koru is a new search interface that offers effective domain-independent knowledge-based information retrieval that exhibits an understanding of the topics of both queries and documents, and is capable of lending assistance to almost every query issued to it.
Wikipedia-based Semantic Interpretation for Natural Language Processing
- Computer ScienceJ. Artif. Intell. Res.
- 2009
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
- Computer ScienceLREC
- 2008
This paper presents two application programming interfaces for Wikipedia and Wiktionary which are especially designed for mining the rich lexical semantic information dispersed in the knowledge bases, and provide efficient and structured access to the available knowledge.
Learning for information extraction: from named entity recognition and disambiguation to relation extraction
- Computer Science
- 2007
This research uses Wikipedia as a repository of named entities and proposes a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy.
Wikitology: a novel hybrid knowledge base derived from wikipedia
- Computer Science
- 2010
The value of the derived knowledge base is demonstrated by developing problem specific intelligent approaches that exploit Wikitology for a diverse set of use cases, namely, document concept prediction, cross document co-reference resolution, Entity Linking to KB entities defined as a part of Text Analysis Conference - Knowledge Base Population Track 2009 and interpreting tables.
Expert-Built and Collaboratively Constructed Lexical Semantic Resources
- Computer Science, LinguisticsLang. Linguistics Compass
- 2010
A comprehensive overview of the lexical semantic knowledge therein is provided and a review of work on orchestrating different resources in order to combine their strengths and explore their use in major NLP applications is reviewed.
Learning to link with wikipedia
- Computer ScienceCIKM '08
- 2008
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%.
Using Wikipedia knowledge to improve text classification
- Computer ScienceKnowledge and Information Systems
- 2008
Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.
Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis
- Computer ScienceIJCAI
- 2007
This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.