Enhanced word embeddings using multi-semantic representation through lexical chains

  title={Enhanced word embeddings using multi-semantic representation through lexical chains},
  author={Terry Ruas and Charles Henrique Porto Ferreira and William I. Grosky and Fabr{\'i}cio Olivetti de França and Debora Maria Rossi de Medeiros},

Figures and Tables from this paper

Math-word embedding in math search and semantic extraction

This paper explores math embedding by testing it on several different scenarios, and shows that it holds much promise for similarity, analogy, and search tasks, however, the need for more robustmath embedding approaches is observed.

Math-word embedding in math search and semantic extraction

This paper explores math embedding by testing it on several different scenarios, and shows that it holds much promise for similarity, analogy, and search tasks, however, the need for more robustmath embedding approaches is observed.

Incorporating Word Sense Disambiguation in Neural Language Models

Two supervised (pre-)training methods are presented that incorporate gloss definitions from lexical resources to leverage Word Sense Disambiguation capabilities in neural language models and exceed state-of-the-art techniques on the SemEval and Senseval datasets.

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

The approach of aspect-based document embeddings mitigates potential risks arising from implicit biases by making them explicit and can, for example, be used for more diverse and explainable recommendations.

FSPRM: A Feature Subsequence Based Probability Representation Model for Chinese Word Embedding

A Feature Subsequence based Probability Representation Model (FSPRM) is proposed for learning Chinese word embeddings, in which the morphological and phonetic features of Chinese characters are integrated and their relevance is considered by designing a feature subsequence.

Identifying Machine-Paraphrased Plagiarism

It is shown that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan, and is evaluated for the detection of machine-paraphrased text using pre-trained word embedding models combined with state-of-the-art neural language models.

Word-Embedding-Based Traffic Document Classification Model for Detecting Emerging Risks Using Sentiment Similarity Weight

Through word imputation using an established similarity dictionary and by widening the limited utilization range, the proposed method overcomes the disadvantage of sentiment dictionaries and enables the detection of emerging risks.

Fake or not? Automated detection of COVID-19 misinformation and disinformation in social networks and digital media

This work aggregated several COVID-19 misinformation datasets and compared differences between learning models from individual datasets versus one that was aggregated, and evaluated the impact of using several word- and sentence-embedding models and transformers on the performance of classification models.

Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection

It is shown tokenizers and models tailored to COVID-19 data do not provide a significant advantage over general-purpose ones, and a broad spectrum of datasets and models are evaluated to benefit future research in developing misinformation detection systems.



Multi-sense embeddings through a word sense disambiguation process

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

This work proposes a new model which learns word and sense embeddings jointly and exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and senses.

Towards Lexical Chains for Knowledge-Graph-based Word Embeddings

This work exploits Lexical Chain based templates over Knowledge Graph for generating pseudo-corpora with controlled linguistic value and shows that, on the one hand, the incorporation of many-relation lexical chains improves results, but on the other hand, unrestricted-length chains remain difficult to handle with respect to their huge quantity.

Embeddings for Word Sense Disambiguation: An Evaluation Study

This work proposes different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and performs a deep analysis of how different parameters affect performance.

Semantic Feature Structure Extraction From Documents Based on Extended Lexical Chains

This paper explores the degree of cohesion among a document’s words using lexical chains as a semantic representation of its meaning using WordNet as a lexical database and develops a text document representation that can be used for semantic document retrieval.

Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation

This work proposes a method that benefits from the semantic similarity in lexical chains to improve SMT output by integrating it in a document-level decoder and focuses on word embeddings to deal with theLexical chains, contrary to the traditional approach that uses lexical resources.

De-Conflated Semantic Representations

This work proposes a technique that tackles semantic representation problems by de-conflating the representations of words based on the deep knowledge it derives from a semantic network, including its high coverage and the ability to generate accurate representations even for infrequent word senses.

Bag of meta-words: A novel method to represent document for the sentiment classification

Enriching Word Vectors with Subword Information

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.