• Corpus ID: 52091589

Features of word similarity

  title={Features of word similarity},
  author={Arthur M. Jacobs and Annette Kinder},
In this theoretical note we compare different types of computational models of word similarity and association in their ability to predict a set of about 900 rating data. Using regression and predictive modeling tools (neural net, decision tree) the performance of a total of 28 models using different combinations of both surface and semantic word features is evaluated. The results present evidence for the hypothesis that word similarity ratings are based on more than only semantic relatedness… 

Figures and Tables from this paper

Computing the Affective-Aesthetic Potential of Literary Texts

The SentiArt tool is established as a promising candidate for lexical sentiment analyses at both the micro- and macrolevels, i.e., short and long literary materials.

Bridging the theoretical gap between semantic representation models without the pressure of a ranking: some lessons learnt from LSA

A critical review of latent semantic analysis (LSA) to clarify some of the misunderstandings regarding LSA and other space models and proposes using long LSA experiences in other models, especially in predicting models such as word2vec.

Computational Models of Readers' Apperceptive Mass

Recent progress in machine-learning-based distributed semantic models (DSMs) offers new ways to simulate the apperceptive mass (AM; Kintsch, 1980) of reader groups or individual readers and to

Entity Thematic Similarity Measurement for Personal Explainable Searching Services in the Edge Environment

This article proposes a novel semantic augmentation method with a double attention mechanism that refers to a dynamic representation learning process that maps an entity to a real number vector in semantic space and shows excellent performance on the task of entity thematic similarity.

Neuroimaging of valence decisions in children and adults

Software similarity measurements using UML diagrams: A systematic literature review

The study reviews and identifies similarity measurements of UML artifacts, with class diagram, sequence diagram, statechart diagram, and use case diagram being UML diagrams that are widely used as research objects for measuring similarity.

Activity Diagram Similarity Measurement: A Different Approach

The preliminary result shows that the semantic and structural similarity is a good parameter to measure the similarity of an activity diagram similarity measurement in software reuse.



Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

It is suggested that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods.

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the

Checking and bootstrapping lexical norms by means of word similarity indexes

A technique for estimating lexical norms based on the latent semantic analysis of a corpus that can be used to check human ratings to identify words for which the rating is very different from the corpus-based estimate.

Evaluating WordNet-based Measures of Lexical Semantic Relatedness

An information-content-based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik, and why distributional similarity is not an adequate proxy for lexical semantic relatedness.

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.

How useful are corpus-based methods for extrapolating psycholinguistic variables?

A systematic comparison of two extrapolation techniques: k-nearest neighbours, and random forest, in combination with semantic spaces built using latent semantic analysis, topic model, a hyperspace analogue to language (HAL)-like model, and a skip-gram model finds that at least some of the extrapolation methods may introduce artefacts to the data and produce results that could lead to different conclusions that would be reached based on the human ratings.

The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German.

It is found that the commonly used Celex frequencies are the least powerful to predict lexical decision times in the German language.

Similarity of Semantic Relations

LRA extends the VSM approach in three ways: the patterns are derived automatically from the corpus, the Singular Value Decomposition (SVD) is used to smooth the frequency data, and automatically generated synonyms are used to explore variations of the word pairs.