An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence
@inproceedings{Rohde2005AnIM, title={An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence}, author={Douglas L. T. Rohde and David C. Plaut}, year={2005} }
The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An alternative approach seeks to model word meanings as high-dimensional vectors, which are derived from the cooccurrence of words in unlabeled text corpora (Landauer & Dumais, 1997; Burgess & Lund…
Figures and Tables from this paper
157 Citations
Estimating the average need of semantic knowledge from distributional semantic models
- Psychology, BiologyMemory & cognition
- 2017
It is argued that CBOW is learning word meanings according to Anderson’s concept of needs probability and can account for nearly all of the variation in lexical access measures typically attributable to word frequency and contextual diversity.
The principals of meaning: Extracting semantic dimensions from co-occurrence models of semantics
- Computer SciencePsychonomic bulletin & review
- 2016
It is shown that the skip-gram model accounts for unique variance in behavioral measures of lexical access above and beyond that accounted for by affective and lexical measures, and it is raised the possibility that word frequency predicts Behavioral measures of Lexical access due to the fact that word use is organized by semantics.
The Role of Negative Information in Distributional Semantic Learning
- Computer ScienceCogn. Sci.
- 2019
The role of negative information in developing a semantic representation is assessed and its power does not reflect the use of a prediction mechanism, and how negative information can be efficiently integrated into classic count-based semantic models using parameter-free analytical transformations is shown.
Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models
- LinguisticsBehavior research methods
- 2015
From this study, morphological decomposition appears to significantly improve performance in word–word co-occurrence semantic space models, providing some support for the claim that sublexical information—specifically, word morphology—plays a role in lexical semantic processing.
A hybrid method based on WordNet and Wikipedia for computing semantic relatedness between texts
- Computer ScienceThe 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012)
- 2012
This work uses a collection of tow well known knowledge bases namely, WordNet and Wikipedia, so that provide more complete data source for calculate the semantic relatedness with a more accuracy.
A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2015
This paper introduces a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure, and proposes an efficient and effective approach for semantic similarity using a large scale semantic network.
Organizing the space and behavior of semantic models
- Computer ScienceCogSci
- 2014
A general framework for organizing the space of semantic models is proposed and it is illustrated how this framework can be used to understand model comparisons in terms of individual manipulations along sub-processes.
Semantic Similarity from Natural Language and Ontology Analysis
- Computer ScienceSemantic Similarity from Natural Language and Ontology Analysis
- 2015
This book proposes an in-depth characterization of existing proposals for semantic similarity estimation by discussing their features, the assumptions on which they are based and empirical results regarding their performance in particular applications, and provides a detailed discussion on the foundations of semantic measures.
Comparing Predictive and Co-occurrence Based Models of Lexical Semantics Trained on Child-directed Speech
- Psychology, Computer ScienceCogSci
- 2016
It is found that models that perform some form of abstraction outperform those that do not, and that co-occurrence-based abstraction models performed the best, however, different models excel at different categories, providing evidence for complementary learning systems.
Supervised word sense disambiguation using semantic diffusion kernel
- Computer ScienceEng. Appl. Artif. Intell.
- 2014
References
SHOWING 1-10 OF 49 REFERENCES
An introduction to latent semantic analysis
- Linguistics
- 1998
The adequacy of LSA's reflection of human knowledge has been established in a variety of ways, for example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word‐word and passage‐word lexical priming data.
Latent Semantic Analysis Approaches to Categorization
- Psychology
- 1997
Latent Semantic Analysis creates high dimensional vectors for concepts in semantic memory through statistical analysis of a large representative corpus of text rather than subjective feature sets linked to object names, and multivariate analyses of similarity matrices show more cohesive structure for natural kinds than for artifacts.
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
- Computer ScienceROCLING/IJCLCLP
- 1997
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the…
Using Measures of Semantic Relatedness for Word Sense Disambiguation
- Computer ScienceCICLing
- 2003
This paper generalizes the Adapted Lesk Algorithm to a method of word sense disambiguation based on semantic relatedness and finds that the gloss overlaps of AdaptedLesk and the semantic distance measure of Jiang and Conrath (1997) result in the highest accuracy.
Producing high-dimensional semantic spaces from lexical co-occurrence
- Computer Science
- 1996
A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word, which provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
- Computer Science
- 1997
A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Modelling Parsing Constraints with High-dimensional Context Space
- Computer Science
- 1997
It is proposed that HAL's high-dim ensional context space can be used to provide a basic categorisation of semantic and grammatical concepts, model certain aspects of morphological ambiguity in verbs, and provide an account of semantic context effects in syntactic processing.
Roget's thesaurus and semantic similarity
- Computer ScienceRANLP
- 2003
A system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests, comparing the results with those produced by WordNet-based similarity measures.
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
- Computer ScienceECML
- 2001
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise…
The Measurement of Textual Coherence with Latent Semantic Analysis.
- Computer Science
- 1998
The approach for predicting coherence through reanalyzing sets of texts from 2 studies that manipulated the coherence of texts and assessed readers’ comprehension indicates that the method is able to predict the effect of text coherence on comprehension and is more effective than simple term‐term overlap measures.