Corpus ID: 13402028

Semi-supervised Named Entity Recognition in noisy-text

@inproceedings{Mishra2016SemisupervisedNE,
  title={Semi-supervised Named Entity Recognition in noisy-text},
  author={Shubhanshu Mishra and Jana Diesner},
  booktitle={NUT@COLING},
  year={2016}
}
Many of the existing Named Entity Recognition (NER) solutions are built based on news corpus data with proper syntax. [...] Key Method The models described in this paper are based on linear chain conditional random fields (CRFs), use the BIEOU encoding scheme, and leverage random feature dropout for up-sampling the training data. The considered features include word clusters and pre-trained distributed word representations, updated gazetteer features, and global context predictions. The latter feature allows…Expand
Raw-to-End Name Entity Recognition in Social Media
TLDR
Neural-Char-CRF is introduced, a raw-to-end framework that is more robust to pre-processing errors and demonstrates its potential to be tokenization-free. Expand
Improving named entity recognition in noisy user-generated text with local distance neighbor feature
TLDR
This paper presents Local Distance Neighbor (LDN), a novel feature that substitutes the gazetteer and makes that the model obtains state-of-the-art results, and demonstrates that the proposal can be useful for Law Enforcement Agencies in mining the textual information in the Tor hidden services. Expand
Collective Named Entity Recognition in User Comments via Parameterized Label Propagation
TLDR
This work proposes a novel semisupervised inference algorithm named parameterized label propagation, which significantly outperforms all other noncollective NER baselines in the Yahoo! News data set, where comments and articles within a thread share similar context. Expand
Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data
TLDR
This paper explored ways to use language identification and translation to recognize named entities in code-switched tweets by utilizing simple features (sans multi-lingual features) with Conditional Random Field (CRF) classifier achieved the best results. Expand
A Multi-task Approach for Named Entity Recognition in Social Media Data
TLDR
A novel multi-task approach is proposed by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization, which learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information. Expand
Improved Deep Persian Named Entity Recognition
  • M. Bokaei, M. Mahmoudi
  • Computer Science
  • 2018 9th International Symposium on Telecommunications (IST)
  • 2018
TLDR
This work uses the only publicly available corpus (ArmanPer-soNER) to train a model in which features are extracted using recurrent and convolutional neural networks and finds the best sequence tag for the input word sequence. Expand
Assessing Demographic Bias in Named Entity Recognition
TLDR
This work assesses the bias in various Named Entity Recognition systems for English across different demographic groups with synthetically generated corpora to shed light on potential biases in automated KB generation due to systematic exclusion of named entities belonging to certain demographics. Expand
Entity linking and name disambiguation using SVM in Chinese micro-blogs
TLDR
This work can provide invaluable insights into entity disambiguation in Chinese micro-blogs and an improved label disambIGuation algorithm is proposed. Expand
Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets
TLDR
Effectiveness of multi-dataset-multi-task learning in training neural models for four sequence tagging tasks for Twitter data, namely, part of speech tagging, chunking, super sense tagging, and named entity recognition is studied. Expand
TwiCS: Lightweight Entity Mention Detection in Targeted Twitter Streams
TLDR
This paper proposes an approach for EMD/ED whose creation is guided by the constraints specific to streaming environments from the ground up and implements TwiCS, a computationally light two-phase process that achieves an average effectiveness improvement of 14.6%, while maintaining at least 2.64 times higher throughput. Expand
...
1
2
...

References

SHOWING 1-10 OF 36 REFERENCES
Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons
TLDR
This work has shown that conditionally-trained models, such as conditional maximum entropy models, handle inter-dependent features of greedy sequence modeling in NLP well. Expand
Semi-Supervised Learning for Natural Language
TLDR
This thesis focuses on two segmentation tasks, named-entity recognition and Chinese word segmentation, and shows that features derived from unlabeled data substantially improves performance, both in terms of reducing the amount of labeled data needed to achieve a certain performance level and in termsof reducing the error using a fixed amount of labeling data. Expand
Analysis of named entity recognition and linking for tweets
TLDR
This work describes a new Twitter entity disambiguation dataset, and conducts an empirical analysis of named entity recognition and disambigsuation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art. Expand
The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition
TLDR
This work emphasizes the effectiveness of representations on Twitter NER, and demonstrates that their inclusion can improve performance by up to 20 F1, and establishes a new state-of-the-art on two common test sets. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entityExpand
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Design Challenges and Misconceptions in Named Entity Recognition
TLDR
Some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system are analyzed, and several solutions to these challenges are developed. Expand
USFD: Twitter NER with Drift Compensation and Linked Data
TLDR
A pilot NER system for Twitter is described, comprising the USFD system entry to the W-NUT 2015 NER shared task, and the goal is to correctly label entities in a tweet dataset, using an inventory of ten types. Expand
Phrase Clustering for Discriminative Learning
We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the power and generalityExpand
...
1
2
3
4
...