Enhancement of Short Text Clustering by Iterative Classification

  title={Enhancement of Short Text Clustering by Iterative Classification},
  author={Md. Rashadul Hasan Rakib and N. Zeh and Magdalena Jankowska and E. Milios},
  journal={Natural Language Processing and Information Systems},
  pages={105 - 117}
Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignment stabilizes. The classifier used in each iteration is trained using the current set of cluster labels of the non-outliers; the input of the first iteration is the output of an arbitrary clustering… Expand
Efficient clustering of short text streams using online-offline clustering
Short text stream clustering is an important but challenging task since massive amount of text is generated from different sources such as micro-blogging, question-answering, and social newsExpand
Supporting Clustering with Contrastive Learning
This work proposes Supporting Clustering with Contrastive Learning (SCCL) – a novel framework to leverage contrastive learning to promote better separation in distance-based clustering and demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-clusters and inter-cluster distances. Expand
Short Text Clustering with Transformers
It is shown that sentence vector representations from Transformers in conjunction with different clustering methods can be successfully applied to address the task of short text clustering, and the algorithm of enhancement of clustering via iterative classification can further improve initial clustering performance with different classifiers, including those based on pre-trained Transformer language models. Expand
Effects on Time and Quality of Short Text Clustering during Real-Time Presentations
Cl clustering algorithms for short texts can confidently be used in real-time presentations, and text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Expand
Pairwise Supervised Contrastive Learning of Sentence Representations
  • Dejiao Zhang, Shang-Wen Li, +4 authors Bing Xiang
  • Computer Science
  • 2021
Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss. Nevertheless,Expand


A model-based approach for text clustering with outlier detection
This paper proposes a collapsed Gibbs Sampling algorithm for the Dirichlet Process Multinomial Mixture model for text clustering (abbr. to GSDPMM) which does not need to specify the number of clusters in advance and can cope with the high-dimensional problem oftext clustering. Expand
A Self-Training Approach for Short Text Clustering
The method is proposed, which learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Expand
Improving Short Text Clustering by Similarity Matrix Sparsification
Two sparsification methods (the proposed Similarity Distribution based, and k-nearest neighbors) that aim to retain a prescribed number of similarity elements per text, improve hierarchical clustering quality of short texts for various text similarities. Expand
Self-Taught Convolutional Neural Networks for Short Text Clustering
A flexible Self-Taught Convolutional neural network framework for Short Text Clustering, which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner is proposed. Expand
Corpus-based topic diffusion for short text clustering
A novel corpus-based enrichment approach for short text clustering using a set of conjugate definitions to characterize the structures of topics and words, and by proposing a virtual generative procedure for short texts that can effectively address the sparseness problem. Expand
Discovering Topic Representative Terms for Short Text Clustering
In this paper, a novel topic representative term discovery (TRTD) method is provided that achieves better accuracy and efficiency in short text clustering than the state-of-the-art methods. Expand
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text. Expand
Clustering of semantically enriched short texts
Two approaches to clustering small sets of very short texts are presented, one based on neural-based distributional models and the other based on external knowledge resources, which are tested on SnSRC and other knowledge-poor algorithms. Expand
A Survey of Text Clustering Algorithms
This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages. Expand
An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit
This study evaluates several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit, and shows that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures. Expand