Corpus ID: 12861120

Named Entity Recognition in Tweets: An Experimental Study

@inproceedings{Ritter2011NamedER,
  title={Named Entity Recognition in Tweets: An Experimental Study},
  author={Alan Ritter and Sam Clark and Mausam and Oren Etzioni},
  booktitle={EMNLP},
  year={2011}
}
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. [...] Key Method Our novel T-ner system doubles F1 score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F1 by 25% over ten…Expand
Learning to recognise named entities in tweets by exploiting weakly labelled data
TLDR
This work proposed an unsupervised learning approach using deep neural networks and leverage a knowledge base to bootstrap sparse entity types with weakly labelled data and obtained robust performance, which ranked third amongst all shared task participants according to the official evaluation on a gold standard named entity-annotated corpus of 3,856 tweets. Expand
Bidirectional LSTM with a Context Input Window for Named Entity Recognition in Tweets
TLDR
A new gold-standard corpus of tweets annotated for Person, Location, and Organization (PLO) is presented and multiple NER experiments are performed using a variety of Long Short-Term Memory (LSTM) based models without resorting to any handcrafted rules. Expand
Reference ResToRinG CaPitaLiZaTion in # TweeTs NEBHI ,
The rapid proliferation of microblogs such as Twitter has resulted in a vast quantity of written text becoming available that contains interesting information for NLP tasks. However, the noise levelExpand
NER from Tweets: SRI-JU System @MSM 2013
TLDR
This article reports the author's participation in the Concept Extraction Challenge, Making Sense of micro posts (#MSM2013), and three different systems runs have been submitted. Expand
ASU: An Experimental Study on Applying Deep Learning in Twitter Named Entity Recognition.
TLDR
This paper describes the ASU system submitted in the COLING W-NUT 2016 Twitter Named Entity Recognition (NER) task and shows detailed experimentation results on the effectiveness of word embeddings, brown clusters, part-of-speech tags, shape features, gazetteers and local context for the tweet input vector representation to the LSTM model. Expand
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
TLDR
The Broad Twitter Corpus (BTC) is introduced, which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users, and measures the entity drift observed in the dataset. Expand
Named Entity Recognition and Disambiguation in Tweets Master
Social media has grown exponentially over the past few years. Users are generating far more unstructured content than ever before. Successful companies are also very active in social media analysingExpand
The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition
TLDR
This work emphasizes the effectiveness of representations on Twitter NER, and demonstrates that their inclusion can improve performance by up to 20 F1, and establishes a new state-of-the-art on two common test sets. Expand
Joint Inference of Named Entity Recognition and Normalization for Tweets
TLDR
A novel graphical model is proposed to simultaneously conduct NER and NEN on multiple tweets to address the problem of named entity normalization for tweets, which introduces a binary random variable for each pair of words with the same lemma across similar tweets. Expand
Analysis of named entity recognition and linking for tweets
TLDR
This work describes a new Twitter entity disambiguation dataset, and conducts an empirical analysis of named entity recognition and disambigsuation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 50 REFERENCES
Recognizing Named Entities in Tweets
TLDR
This work proposes to combine a K-Nearest Neighbors classifier with a linear Conditional Random Fields model under a semi-supervised learning framework to tackle the challenges of Named Entities Recognition for tweets. Expand
Improving automated lexical and discourse analysis of online chat dialog
TLDR
A chat corpus is built, initially tagged with lexical and discourse information that could be used to develop stochastic NLP applications that perform tasks such as conversation thread topic detection, author profiling, entity identification, and social network analysis. Expand
Lexical Normalisation of Short Text Messages: Makn Sens a #twitter
TLDR
This paper targets out-of-vocabulary words in short text messages and proposes a method for identifying and normalising ill-formed words, which achieves state- of-the-art performance over an SMS corpus and a novel dataset based on Twitter. Expand
Unsupervised Models for Named Entity Classification
TLDR
It is shown that the use of unlabeled data can reduce the requirements for supervision to just 7 simple "seed" rules, gaining leverage from natural redundancy in the data. Expand
Locating Complex Named Entities in Web Text
TLDR
This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text and shows that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
Annotating Named Entities in Twitter Data with Crowdsourcing
We describe our experience using both Amazon Mechanical Turk (MTurk) and Crowd-Flower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionallyExpand
Distant supervision for relation extraction without labeled data
TLDR
This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size. Expand
Event Discovery in Social Media Feeds
TLDR
A graphical model is developed that addresses record extraction from social streams such as Twitter by learning a latent set of records and a record-message alignment simultaneously, resulting in a set of canonical records that are consistent with aligned messages. Expand
Lexical and Discourse Analysis of Online Chat Dialog
TLDR
The purpose of this research is to build a chat corpus, tagged with lexical (token part-of-speech labels), syntactic (post parse tree), and discourse (post classification) information that can be used to develop more complex, statistical-based NLP applications that perform tasks such as author profiling, entity identification, and social network analysis. Expand
...
1
2
3
4
5
...