TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

Abstract

Twitter is the largest source of microblog text, responsible for gigabytes of human discourse every day. Processing microblog text is difficult: the genre is noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets and other microblog text. We present TwitIE, an open-source NLP pipeline customised to microblog text at every stage. Additionally, it includes Twitter-specific data import and metadata handling. This paper introduces each stage of the TwitIE pipeline, which is a modification of the GATE ANNIE open-source pipeline for news text. An evaluation against some state-of-the-art systems is also presented.

Extracted Key Phrases

5 Figures and Tables

0204020132014201520162017
Citations per Year

121 Citations

Semantic Scholar estimates that this publication has 121 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Bontcheva2013TwitIEAO, title={TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text}, author={Kalina Bontcheva and Leon Derczynski and Adam Funk and Mark A. Greenwood and Diana Maynard and Niraj Aswani}, booktitle={RANLP}, year={2013} }