Empath: Understanding Topic Signals in Large-Scale Text

@article{Fast2016EmpathUT,
  title={Empath: Understanding Topic Signals in Large-Scale Text},
  author={Ethan Fast and Binbin Chen and Michael S. Bernstein},
  journal={Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems},
  year={2016}
}
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize… 
Lexicons on Demand: Neural Word Embeddings for Large-Scale Text Analysis
TLDR
Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like “bleed” and “punch” to generate the category violence) and is highly correlated with similar categories in LIWC.
Affect Lexicon Induction For the Github Subculture Using Distributed Word Representations
TLDR
A new approach to inspect cultural differences on the level of sentiments and compare subculture with the general social environment is presented and shows individuals from two environments have different understandings to sentiment-related words and phrases but agree on nouns and adjectives.
Crowdsourced Detection of Emotionally Manipulative Language
TLDR
This work introduces an approach, anchor comparison, that leverages workers' ability to identify and remove instances of EML in text to create a paraphrased "anchor text", which is then used as a comparison point to classify E ML in the original content.
Lexicon Building using Word Embedding
TLDR
A bipolar concept model and support for irrelevant words are introduced and the quantitative evaluation shows that the bipolar lexicon generated with these methods is comparable to human-generated ones.
Application of Transfer Learning for Automatic Triage of Social Media Posts
TLDR
It is shown that transfer learning is an effective strategy for predicting risk with relatively little labeled data and finetuning of pretrained language models provides further gains when large amounts of unlabeled text is available.
ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding
TLDR
A bipolar concept model and support for specifying irrelevant words are introduced andQuantitative evaluation shows that the bipolar lexicon generated with the ConceptVector methods is comparable to human-generated ones.
Inquire: Large-scale Early Insight Discovery for Qualitative Research
TLDR
In Inquire, a tool designed to enable qualitative exploration of utterances in social media and large-scale texts, it is shown how queries become a part of the inductive process, enabling researchers to try multiple ideas while gaining intuition and discovering less-obvious insights.
Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study (Preprint)
TLDR
It is found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.
The Automatic Analysis of Emotion in Political Speech Based on Transcripts
TLDR
It is found that transcripts capture sentiment, but not emotional arousal, and dictionaries created using word embeddings are sensitive to the choice of seed words and to training corpus size.
LOCO: The 88-million-word language of conspiracy corpus
TLDR
A subset of the most representative conspiracy documents are derived, which, compared to other conspiracy documents, display prototypical and exaggerated conspiratorial language and are more frequently shared on Facebook.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 52 REFERENCES
CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON
TLDR
It is shown how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively.
VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text
TLDR
Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
TLDR
SENTIWORDNET is a lexical resource in which each WORDNET synset is associated to three numerical scores Obj, Pos and Neg, describing how objective, positive, and negative the terms contained in the synset are.
You're happy, I'm happy: diffusion of mood expression on twitter
TLDR
It is observed that moods of high valence and low to moderate activation propagate the most, and that Moods in postings with high self-attentional focus, posted by socially interactive women, and with fewer links diffuse the most.
We feel fine and searching the emotional web
We present We Feel Fine, an emotional search engine and web-based artwork whose mission is to collect the world's emotions to help people better understand themselves and others. We Feel Fine
Linguistic Regularities in Continuous Space Word Representations
TLDR
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.
Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena
TLDR
It is speculated that large scale analyses of mood can provide a solid platform to model collective emotive trends in terms of their predictive value with regards to existing social as well as economic indicators.
Predicting Depression via Social Media
TLDR
It is found that social media contains useful signals for characterizing the onset of depression in individuals, as measured through decrease in social activity, raised negative affect, highly clustered egonetworks, heightened relational and medicinal concerns, and greater expression of religious involvement.
...
1
2
3
4
5
...