Alexander F. Gelbukh

Learn More
For most English words dictionaries give various senses: e.g., “bank” can stand for a financial institution, shore, set, etc. Automatic selection of the sense intended in a given text has crucial importance in many applications of text processing, such as information retrieval or machine translation: e.g., “(my account in the) bank” is to be translated into(More)
The use of conceptual graphs for the representation of text contents in information retrieval is discussed. A method for measuring the similarity b etween two texts represented as conceptual graphs is presented. The method is based on well-known strategies of text comparison, such as Dice coefficient, with new elements introduced due to the bipartite nature(More)
We present an approach for the construction of text similarity functions using a parameterized resemblance coefficient in combination with a softened cardinality function called soft cardinality. Our approach provides a consistent and recursive model, varying levels of granularity from sentences to characters. Therefore, our model was used to compare(More)
The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask at PAN 2014 plagiarism detection(More)
Emotions play a key role in natural language understanding and sensemaking. Pure machine learning usually fails to recognize and interpret emotions in text. The need for knowledge bases that give access to semantics and sentics (the conceptual and affective information) associated with natural language is growing exponentially in the context of big social(More)
SenticNet is currently one of the most comprehensive freely available semantic resources for opinion mining. However, it only provides numerical polarity scores, while more detailed sentiment-related information for its concepts is often desirable. Another important resource for opinion mining and sentiment analysis is WordNet-Affect, which in turn lacks(More)
Nowadays, most of documents are produced in digital format, in which they can be easily accessed and copied. Document copy detection is a very important tool for protecting the author’s copyright. We present PPChecker, a document copy detection system based on plagiarism pattern checking. PPChecker calculates the amount of data copied from the original(More)