Semantic Clustering of Russian Web Search Results: Possibilities and Problems

  title={Semantic Clustering of Russian Web Search Results: Possibilities and Problems},
  author={Andrey Kutuzov},
The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are described. 
Word SenSe InductIon for ruSSIan : deep Study and comparISon WIth dIctIonarIeS
The assumption that senses are mutually disjoint and have clear boundaries has been drawn into doubt by several linguists and psychologists. The problem of word sense granularity is widely discussedExpand
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as aExpand


Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
Key to the approach is to first acquire the various senses of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced, which outperforms both Web clustering and search engines. Expand
A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches
A quick tour on how to start doing research in this exciting field of NLP and suggests the hottest topics to focus on. Expand
HyperLex: lexical cartography for information retrieval
An algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary is described, which makes use of the specific properties of word cooccurrence graphs, which are shown as having “small world” properties. Expand
Dictionary word sense distinctions: An enquiry into their nature
The two studies described here look into their grounds for making distinctions, developing a classification scheme to describe the commonly occurring distinction types and a view of the ontological status of dictionary word senses. Expand
Frequency of Use and the Organization of Language
This volume collects three decades of articles by distinguish linguist Joan Bybee. Her articles essentially argue for the importance of frequency of use as a factor in the analysis and explanation ofExpand
Translating Collocations for Bilingual Lexicons: A Statistical Approach
A program named Champollion is described which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations, to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. Expand
Distributional Structure
This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning. Expand
FreeLing 3.0: Towards Wider Multilinguality
The general architecture of the library is described, the major changes and improvements included in FreeLing version 3.0 are presented, and some relevant industrial projects in which it has been used are summarized. Expand
Collective dynamics of ‘small-world’ networks
Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. Expand
Towards wider multilinguality
  • eds.: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12),
  • 2012