• Corpus ID: 16627727

FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia

@inproceedings{Ferschke2012FlawFinderAM,
  title={FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia},
  author={Oliver Ferschke and Iryna Gurevych and Marc Rittberger},
  booktitle={Conference and Labs of the Evaluation Forum},
  year={2012}
}
With over 23 million articles in 285 languages, Wikipedia is the largest free knowledge base on the web. Due to its open nature, everybody is allowed to access and edit the contents of this huge encyclopedia. As a downside of this open access policy, quality assessment of the content becomes a critical issue and is hardly manageable without computational assistance. In this paper, we present FlawFinder, a modular system for automatically predicting quality flaws in unseen Wikipedia articles. It… 

Figures and Tables from this paper

Predicting Information Quality Flaws in Wikipedia by Using Classical and Deep Learning Approaches

This work tackles the problem of automatically predicting five out of the ten most frequent quality flaws in Wikipedia; namely: No footnotes, Notability, Primary Sources, Refimprove and Wikify, and shows that under-bagged decision trees with different aggregation rules perform best improving the existing benchmarks for four out the five flaws.

Predicting quality flaws in user-generated content: the case of wikipedia

A quality flaw model is developed and a dedicated machine learning approach is employed to predict Wikipedia's most important quality flaws, arguing that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem.

Overview of the 1th International Competition on Quality Flaw Prediction in Wikipedia

The paper overviews the task "Quality Flaw Prediction in Wikipedia" of the PAN'12 competition and the performance of three qual- ity flaw classifiers is evaluated.

The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

It is argued that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results.

Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia

This study presents and classifies measures that can be extracted from Wikipedia articles for the purpose of automatic quality assessment in different languages, and describes also an extraction methods for various sources of measures, which can be used in quality assessment.

Towards Information Quality Assurance in Spanish: Wikipedia

A breakdown of Spanish Wikipedia’s quality flaw structure is presented and three different corpora are carried out to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task.

WikiLyzer: Interactive Information Quality Assessment in Wikipedia

Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example,

On the Assessment of Information Quality in Spanish Wikipedia

A first breakdown of Spanish Wikipedia’s quality flaw structure is presented and a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task.

On the Use of Reliable-Negatives Selection Strategies in the PU Learning Approach for Quality Flaws Prediction in Wikipedia

This paper revisits the winner approach of 2012 and elaborates on neglected aspects in order to provide evidence for the usefulness of sampling in PU learning, and shows how the different sampling strategies affect the flaw prediction effectiveness.

References

SHOWING 1-10 OF 30 REFERENCES

A breakdown of quality flaws in Wikipedia

The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing

Predicting quality flaws in user-generated content: the case of wikipedia

A quality flaw model is developed and a dedicated machine learning approach is employed to predict Wikipedia's most important quality flaws, arguing that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem.

On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia

The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the analysis and detection of specific quality flaws in Wikipedia articles.

Identifying featured articles in wikipedia: writing style matters

A machine learning approach is presented that exploits an article's character trigram distribution and aims to writing style rather than evaluating meta features like the edit history, which is robust, straightforward to implement, and outperforms existing solutions.

Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

This work explores a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles, and explores machine learning techniques to combine these quality indicators into one single assessment judgment.

Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia’s Edit History

An open-source toolkit which allows to reconstruct past states of Wikipedia, and to efficiently access the edit history of Wikipedia articles, and the language-independent design allows to process any language represented in Wikipedia is presented.

Measuring article quality in wikipedia: models and evaluation

This paper proposes three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history and proposes a model that combines partial reviewership of contributors as they edit various portions of the articles.

Statistical measure of quality in Wikipedia

This study model the evolution of content quality in Wikipedia articles in order to estimate the fraction of time during which articles retain high-quality status and assess the quality of Wikipedia's featured and non-featured articles.

Mining the Factors Affecting the Quality of Wikipedia Articles

The findings indicate that the information quality is mainly affected by completeness, and well-written is a basic requirement in the initial stage, and reputation of authors or editors is not so important in Wikipedia because of its horizontal structure.

Probabilistic Quality Assessment Based on Article's Revision History

The method can accurately and objectively capture web article's quality by comparing the article's state sequence with the patterns of pre-classified documents in probabilistic sense.