LEXpander: applying colexification networks to automated lexicon expansion

  author={Anna Natale and David Garcia},
Recent approaches to text analysis from social media and other corpora rely on word lists to detect topics, measure meaning, or to select relevant documents. These lists are often generated by applying computational lexicon expansion methods to small, manually-curated sets of root words. Despite the wide use of this approach, we still lack an exhaustive comparative analysis of the performance of lexicon expansion methods and how they can be improved with additional linguistic data. In this work… 

New conceptions of truth foster misinformation in online public political discourse

The spread of online misinformation is increasingly perceived as a major problem for societal cohesion and democracy [1, 2]. Much attention has focused on the role of social media as a vector of



Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention

This work considers the LIWC lexicon as a hierarchical classification problem and utilizes the Sequence-to-Sequence model to classify words in the lexicon, and uses the sememe information with the attention mechanism to capture the exact meanings of a word, so that it can expand a more precise and comprehensive lexicon.

Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists

Lexifield is presented, a fully automatic language-independent system for building domain-specific lexicons from a short list of terms defining the domain that achieves better precision and recall on reference lists extracted from manually created resources such as Roget’s Thesaurus.

Empath: Understanding Topic Signals in Large-Scale Text

Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.

Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts

A sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method, which uses expanded lexicons, lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts is proposed.

The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research.

Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text

A computer-assisted (as opposed to fully automated or human-only) statistical approach that suggests keywords from available text without needing structured data as inputs is developed, which leads to a widely applicable algorithm.

Enriching Word Vectors with Subword Information

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

This paper proposes a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks, and shows the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

OdeNet: Compiling a GermanWordNet from other Resources

The Princeton WordNet for the English language has been used worldwide in NLP projects for many years and the development of a wordnet for the German language is also in this context.

WordNet: A Lexical Database for English

WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.