Named Entity Recognition in Tweets: An Experimental Study

Abstract

People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by rebuilding the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F 1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F 1 by 25% over ten common entity types.

Extracted Key Phrases

Showing 1-10 of 416 extracted citations
0501001502011201220132014201520162017
Citations per Year

605 Citations

Semantic Scholar estimates that this publication has received between 527 and 700 citations based on the available data.

See our FAQ for additional information.