EVE: explainable vector based embedding technique using Wikipedia

  title={EVE: explainable vector based embedding technique using Wikipedia},
  author={Muhammad Atif Qureshi and Derek Greene},
  journal={Journal of Intelligent Information Systems},
We present an unsupervised explainable vector embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a concept using human-readable labels, thereby it is readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link structure. To test the effectiveness of the proposed model, we consider its usefulness in… 

Explainable Recommendation based on Wikipedia Concept Vectors

An explainable recommendation system for novels and authors, called Lit@EVE, which is based on Wikipedia concept vectors, which generates an ordered list of suggested items by showing the most informative features as human-readable labels, thereby making the recommendation explainable.

Comparison of Embedding Techniques for Topic Modeling Coherence Measures

This work evaluates the difference between two popular word embedding algorithms and their variants, using two distinct external reference corpora, to discover if these underlying choices have a substantial impact on the resulting coherence scores.

Comparing general and specialized word embeddings for biomedical named entity recognition

Evaluating three well-known NER algorithms with respect to two corpora using two classic word embeddings, GloVe Common Crawl (of the general type) and Pyysalo PM + PMC (specific), as unique features shows that the classic general word embedding performed better in the DrugBank corpus, despite having less word coverage and a lower internal semantic relationship.

Lex2vec: making Explainable Word Embedding via Distant Supervision

An algorithm is proposed, called Lex2vec, that exploits lexical resources to inject information into word embeddings and name the embedding dimensions by means of distant supervision and evaluates the optimal parameters to extract a number of informative labels.

Towards Explainable Semantic Text Matching

An approach similar to local interpretable model-agnostic explanations (LIME) to better understand the behavior of text similarity measures like TFIDF and word embeddings is proposed, which is called eXplainable Semantic Text Matching (XSTM).

Explaining AI-based Decision Support Systems using Concept Localization Maps

Concept Localization Maps (CLMs) are introduced, which is a novel approach towards explainable image classifiers employed as DSS using visual input modalities and show great promise for easing acceptance of DSS in practice.

Understanding Legal Documents: Classification of Rhetorical Role of Sentences Using Deep Learning and Natural Language Processing

A deep learning model is proposed that breaks down legal documents and classifies the rhetorical types of sentences and will automate the processing of legal documents hence decreasing, and ultimately eliminating the backlog that currently exists throughout various legal systems.

Improving Semantic Search in the German Legal Domain with Word Embeddings

This thesis investigates the use of word embeddings (word2vec, FastText, GloVe) to improve semantic search in legal documents and shows that a natural language search can be used as a complementary search method to traditional keyword search.

Intelligent System for Semantically Similar Sentences Identification and Generation Based on Machine Learning Methods

All the main technical aspects of solving the task of generating semantically similar sentences, and proposed solutions for the development of algorithmic, functional and software components of the application of identification and generation of semanticallySimilar sentences are described.



A Latent Variable Model Approach to PMI-based Word Embeddings

A new generative model is proposed, a dynamic version of the log-linear topic model of Mnih and Hinton (2007) to use the prior to compute closed form expressions for word statistics, and it is shown that latent word vectors are fairly uniformly dispersed in space.

Word Embedding based Generalized Language Model for Information Retrieval

A generalized language model is constructed, where the mutual independence between a pair of words (say t and t') no longer holds and the vector embeddings of the words are made use of to derive the transformation probabilities between words.

Knowledge Graph and Text Jointly Embedding

Large scale experiments on Freebase and a Wikipedia/NY Times corpus show that jointly embedding brings promising improvement in the accuracy of predicting facts, compared to separately embedding knowledge graphs and text.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

RC-NET: A General Framework for Incorporating Knowledge into Word Representations

This paper builds the relational knowledge and the categorical knowledge into two separate regularization functions, and combines both of them with the original objective function of the skip-gram model to obtain word representations enhanced by the knowledge graph.

Vector Embedding of Wikipedia Concepts and Entities

This paper uses deep learning to embed Wikipedia concepts and entities based on Concept Analogy and Concept Similarity tasks and shows that proposed approaches have the performance comparable and in some cases even higher than the state-of-the-art methods.

Structured Embedding via Pairwise Relations and Long-Range Interactions in Knowledge Base

This paper introduces Path-Ranking to capture the long-range interactions of knowledge graph and at the same time preserve the pairwise relations of knowledgeGraph (SePLi), which achieves better performances of embeddings.

Distributional Memory: A General Framework for Corpus-Based Semantics

The Distributional Memory approach is shown to be tenable despite the constraints imposed by its multi-purpose nature, and performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against several state-of-the-art methods.

Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis

This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.

Integrating and Evaluating Neural Word Embeddings in Information Retrieval

This paper uses neural word embeddings within the well known translation language model for information retrieval, which captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance.