Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems

  title={Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems},
  author={Chris Biemann},
We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the performance of Chinese Whispers is measured on Natural Language Processing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation. At this, the fact is employed that the small-world property holds for many graphs in NLP. 

Figures and Tables from this paper

Fine-tuning Chinese Whispers algorithm for a Slavonic language POS tagging task and its evaluation
Chris Biemann’s robust Chinese Whispers graph clustering algorithm working in the Structure Discovery paradigm is presented to a Slavonic language (Polish), focusing on fine-tuning the parameters and finding an evaluation method for POS tagging application aiming at getting a very small (coarse-grained) tagset.
Word Sense Induction & Disambiguation Using Hierarchical Random Graphs
The inferred hierarchical structures are applied to the problem of word sense disambiguation, where it is shown that the method performs significantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.
MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction
This paper introduces a linear time graph-based soft clustering algorithm, suited to tasks such Word Sense Induction (WSI), where the number of classes is unknown and where class distributions may be skewed.
ISCAS: A System for Chinese Word Sense Induction Based on K-means Algorithm
This paper presents an unsupervised method for automatic Chinese word sense induction based on clustering the similar words according to the contexts in which they occur through singular value decomposition method.
Graph-Based Induction of Word Senses in Croatian
This paper addresses the WSI task for the Croatian language with the word clustering approach based on co-occurrence graphs, and makes available two induced sense inventories of 10,000 most frequent Croatian words, both obtained using the Markov Clustering algorithm.
WoSIT: A Word Sense Induction Toolkit for Search Result Clustering and Diversification
The main mission of WoSIT is to provide a framework for the extrinsic evaluation of WSI algorithms, also within end-user applications such as Web search result clustering and diversification.
Graph-based approaches to word sense induction
A novel parameter-free soft clustering algorithm that runs in time linear in the number of edges in the input graph, and novel generalisations of the clustering coeficient to the weighted case are applied.
Unsupervised Parts-of-Speech Induction for Bengali
A study of the word interaction networks of Bengali in the framework of complex networks reveals interesting insights into the morpho-syntax of the language, whereas clustering helps in the induction of the natural word classes leading to a principled way of designing POS tagsets.
Graph-Based Methods for Natural Language Processing and Understanding—A Survey and Analysis
This survey and analysis presents the functional components, performance, and maturity of graph-based methods for natural language processing and natural language understanding and their potential
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
This is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses, and compares different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudowords.


Hierarchical Clustering of Words and Application to NLP Tasks
A data-driven method for hierarchical clustering of words and clusters of multiword compounds, which can avoid the data sparseness problem which is ubiquitous in corpus statistics.
Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation
This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation observation by using triplets of words instead of pairs, which enables automatic parameter optimization of the WSI algorithm.
Language-Independent Methods for Compiling Monolingual Lexical Data
A flexible, portable and language-independent infrastructure for setting up large monolingual language corpora and the extraction and usage of sentence-based word collocations is discussed in detail.
Disentangling from Babylonian Confusion - Unsupervised Language Identification
Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on.
On the Nature of Structure and Its Identification
A new and lucid structure measure, the so-called weighted partial connectivity, Λ, whose maximization defines a graph's structure is introduced, which results in a new splitting theorem concerning the well-known minimum cut splitting measure.
An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation
A novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated, resulting in an optimal solution equivalent to that obtained by partitioning the complete equivalent tree and is able to handle very large graphs with several hundred thousand vertices.
On the NP-Completeness of Some Graph Cluster Measures
It is proved that the decision problems associated with the optimization tasks of finding clusters that are optimal with respect to these fitness measures are NP-complete.
A cluster algorithm for graphs
The MCL~algorithm and process, convergence towards equilibrium states, interpretation of the states as clusterings, and implementation and scalability are described.
Small worlds: the dynamics of networks between order and randomness
  • Jie Wu
  • Computer Science
  • 2002
Everyone knows the small-world phenomenon: soon after meeting a stranger, we are surprised to discover that we have a mutual friend, or we are connected through a short chain of acquaintances. In his
This paper proposes a state of the art about models of networks developed in several fields, in order to help modellers to choose relevant models con-cerning their problematic, to test the so-called “social net-work” effect.