Corpus ID: 33148539

Improving the utility of social media with Natural Language Processing

@inproceedings{Han2014ImprovingTU,
  title={Improving the utility of social media with Natural Language Processing},
  author={Bo Han},
  year={2014}
}
  • Bo Han
  • Published 2014
  • Computer Science
Social media has been an attractive target for many natural language processing (NLP) tasks and applications in recent years. However, the unprecedented volume of data and the non-standard language register cause problems for off-the-shelf NLP tools. This thesis investigates the broad question of how NLP-based text processing can improve the utility (i.e., the effectiveness and efficiency) of social media data. In particular, text normalisation and geolocation prediction are closely examined in… Expand
Predicting real estate market trends and value using pre-processing and sentiment text mining analysis
TLDR
The main aim behind text mining is to convert large corpus of text into numbers by applying influential mining technique to extract meaningful knowledge patterns from text sources through the identification and exploration of fascinating patterns. Expand
Transferring Informal Text in Arabic as Low Resource Languages: State-of-the-Art and Future Research Directions
TLDR
The Arabic and Arabic dialects are focused on as a low resource language in the era of transferring non-stander text using normalization and translation approach because of lack of enough parallel dataset. Expand
A pragmatic guide to geoparsing evaluation
TLDR
A new framework describing the task, metrics and data used to compare state-of-the-art systems and proposing a fine-grained Pragmatic Taxonomy of Toponyms with implications for Named Entity Recognition (NER) and beyond is introduced. Expand
Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions
TLDR
This paper presents the Wikipedia Cultural Diversity dataset, a dataset that contains a classification of the articles that represent its associated cultural context, i.e. all concepts and entities related to the language and to the territories where it is spoken. Expand
A Pragmatic Guide to Geoparsing Evaluation Toponyms , Named Entity Recognition and Pragmatics
Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further madeExpand
Détection des mots non-standards dans les tweets avec des réseaux de neurones (Detecting non-standard words in tweets with neural networks)
TLDR
Cette détection des mots à corriger est l’étape préliminaire pour the normalisation des textes non standards comme les tweets. Expand
Lexical Normalization for Code-switched Data and its Effect on POS Tagging
TLDR
This paper proposes three normalization models specifically designed to handle code-switched data which are evaluated for two language pairs: Indonesian-English and Turkish-German, and introduces novel normalization layers and their corresponding language ID and POS tags for the dataset. Expand

References

SHOWING 1-10 OF 108 REFERENCES
A Broad-Coverage Normalization System for Social Media Language
TLDR
A cognitively-driven normalization system that integrates different human perspectives in normalizing the nonstandard tokens, including the enhanced letter transformation, visual priming, and string/phonetic similarity is proposed. Expand
A Beam-Search Decoder for Normalization of Social Media Text with Application to Machine Translation
TLDR
A novel beam-search decoder is proposed to effectively integrate various normalization operations and shows statistically significant improvements over two strong baselines in both normalization and translation tasks, for both Chinese and English. Expand
Syntactic Normalization of Twitter Messages
The use of computer mediated communication such as emailing, microblogs, Short Messaging System (SMS), and chat rooms has created corpora which contain incredibly noisy text. Tweets, messages sent byExpand
Named Entity Recognition in Tweets: An Experimental Study
TLDR
The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. Expand
Adaptive Parser-Centric Text Normalization
TLDR
This paper takes a parser-centric view of normalization that aims to convert raw informal text into grammatically correct text, and demonstrates that this approach outperforms not only the state-of-the-art wordto-word normalization techniques, but also manual word-to- word annotations. Expand
Using paraphrases for improving first story detection in news and Twitter
TLDR
A novel way of integrating paraphrases with locality sensitive hashing (LSH) is shown in order to obtain an efficient FSD system that can scale to very large datasets and achieves state-of-the-art results on the first story detection task. Expand
Normalizing Microtext
TLDR
This work proposes a normalization approach based on the source channel model, which incorporates four factors, namely an orthographic factor, a phonetic factors, a contextual factor and acronym expansion, which can normalize Twitter messages reasonably well and outperforms existing algorithms on a public SMS data set. Expand
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
TLDR
This work systematically evaluates the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy on Twitter and achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks. Expand
The where in the tweet
TLDR
This paper attempts to predict the POI tag of a tweet based on its textual content and time of posting, and uses web pages retrieved by search engines as an additional source of evidence to tackle the sparsity of tweets tagged with POIs. Expand
Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language
TLDR
Twitter’s language is surprisingly more conservative, and less informal than SMS and online chat, and Twitter users appear to be developing linguistically unique styles, as well as several key insights. Expand
...
1
2
3
4
5
...