Design and implementation of the Sweble Wikitext parser: unlocking the structured data of Wikipedia

@inproceedings{Dohrn2011DesignAI,
  title={Design and implementation of the Sweble Wikitext parser: unlocking the structured data of Wikipedia},
  author={Hannes Dohrn and Dirk Riehle},
  booktitle={Int. Sym. Wikis},
  year={2011}
}
The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki's content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to Media Wiki, the software running Wikipedia, and most other wiki engines. This… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 20 CITATIONS

Design and implementation of wiki content transformations and refactorings

VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

DBkWik: A Consolidated Knowledge Graph from Thousands of Wikis

  • 2018 IEEE International Conference on Big Knowledge (ICBK)
  • 2018
VIEW 1 EXCERPT
CITES METHODS

CMC Corpora in DeReKo

VIEW 1 EXCERPT
CITES METHODS

References

Publications referenced by this paper.

Better extensibility through modular syntax

  • PLDI
  • 2006
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Similar Papers

Loading similar papers…