• Corpus ID: 9576532

Modified Makagonov's Method for Testing Word Similarity and its Application to Constructing Word Frequency Lists

  title={Modified Makagonov's Method for Testing Word Similarity and its Application to Constructing Word Frequency Lists},
  author={Xavier Blanco and Mikhail Alexandrov and Alexander Gelbukh},
  journal={Research on computing science},
By (morphologically) similar wordforms we understand wordforms (strings) that have the same base meaning (roughly, the same root), such as sadly and sadden. The task of deciding whether two given strings are similar (in this sense) has numerous applications in text processing, e.g., in information retrieval, for which usually stemming is employed as an intermediate step. Makagonov has suggested a weakly supervised approach for testing word similarity, based on empirical formulae comparing the… 

Tables from this paper

Elaborating Formulae for Testing Word Similarity in Inflective Languages
We consider similar words as words having the same base meaning (sad, sadness, sadly, etc.). Identification of such words is an important procedure for many IR applications, especially for
Constructing Empirical Models for Automatic Dialog Parameterization
The paper shows how to avoid difficulties when dealing with politeness, competence, satisfaction, and other similar characteristics of clients using empirical formulae based on lexical-grammatical properties of a text.
Natutal language processing: Perspective of CIC-IPN
  • Alexander Gelbukh
  • Computer Science
    2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
  • 2013
Part of the results of fifteen years of research of the Natural Language Processing (NLP) Laboratory of CIC-IPN are outlined, such as resolving ambiguities and constructing dictionaries.
Natural Language Processing : Perspective of CIC-IPN Keynote
This group’s work is concentrated on the internal tasks of this technology, such as resolving ambiguities and constructing.
Locating Regression Bugs
The CodePsychologist is presented, a tool which assists the programmer to locate source code segments that caused a given regression bug, and goes beyond current tools that identify all the lines of code that changed since the feature in question worked properly.


Empirical Formula for Testing Word Similarity and Its Application for Constructing a Word Frequency List
This work proposes a heuristic approximate method for identifying strings resulting from morphological variation of the same base meaning based on an empirical formula for testing the similarity of two words using large morphological dictionaries.
Testing Word Similarity: Language Independent Approach with Examples from Romance
This paper considers a set of models (formulae) of a given class and selects the best ones using training and test samples and demonstrates how to construct such formulae for a given language using an inductive method of model self-organization.
An algorithm for suffix stripping
An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL and performs slightly better than a much more elaborate system with which it has been compared.
Keyword Extraction for Text Characterization
An efficient and robust, language-and domain-independent approach which is based on small word parts (quadgrams) is proposed which can be improved by reexamining and re-ranking keywords using edit distance and an algorithm based on the relativistic addition of velocities.
Morphological Analysis of Inflective Languages through Generation
A method that avoids the use of rules that specify what stems can be generated from a given one by generating and verifying the hypotheses about possible grammatical forms is suggested.
Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort
Development of morphological analysis systems for inflective languages is a tedious and laborious task. We suggest an approach for development of such systems that permits to spend less time and
Mathematical methods of statistics
In this text about statistical mathematical theory, Harald Cramer joins two major lines of development in the field: while British and American statisticians were developing the science of
Formulae for Testing Word Similarity trained on examples
  • 2004
Formulae for Testing Word Similarity trained on examples. In: Corpus Linguistics-2004
  • Proc. of linguistics seminar of Sankt-Petersburg University, Russia,
  • 2004
Modern Information Retrieval
  • 1999