Intent Discovery Through Unsupervised Semantic Text Clustering

  title={Intent Discovery Through Unsupervised Semantic Text Clustering},
  author={Padmasundari and Srinivas Bangalore},
Conversational systems need to understand spoken language to be able to converse with a human in a meaningful coherent manner. [] Key Method We explore a range of representations for the texts and various clustering methods to validate the clustering stability through quantitative metrics like Adjusted Random Index (ARI). A final alignment of the clusters to the semantic intent is determined through consensus labelling. Our experiments on public datasets demonstrate the effectiveness of our approach…

Tables from this paper

Discovering New Intents with Deep Aligned Clustering
This work proposes an effective method (Deep Aligned Clustering) to discover new intents with the aid of limited known intent data and proposes an alignment strategy to tackle the label inconsistency during clustering assignments.
New Intent Discovery with Pre-training and Contrastive Learning
This paper proposes a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning, and designs a new contrastive loss to exploit self-supervisory signals in unlabeling data for clustering.
Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement
Constrained deep adaptive clustering with cluster refinement (CDAC+) is proposed, an end-to-end clustering method that can naturally incorporate pairwise constraints as prior knowledge to guide the clustering process.
Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning
This work proposes a novel disentangled knowledge transfer method via a unified multi-head contrastive learning framework that aims to bridge the gap between IND pre-training and OOD clustering.


ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
A new version of the linked open data resource ConceptNet is presented that is particularly well suited to be used with modern NLP techniques such as word embeddings, with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.
Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally
An effective solution that re-casts the problem formulation, radically different from traditional or semi-supervised clustering is developed and its superior ability using publicly available datasets is demonstrated.
Distributed Representations of Sentences and Documents
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Short and Sparse Text Topic Modeling via Self-Aggregation
A novel model integrating topic modeling with short text aggregation during topic inference is presented, founded on general topical affinity of texts rather than particular heuristics, making the model readily applicable to various short texts.
Sentence Clustering Using Continuous Vector Space Representation
Empirical evidence is provided for proving that the use of the word2vec toolkit for obtaining the representation of a given word as a continuous vector space can lead to better clusters.
A Survey of Topic Modeling in Text Mining
Different models, such as topic over time (TOT), dynamic topic models (DTM), multiscale topic tomography, dynamic topic correlation detection, detecting topic evolution in scientific literature, etc. are discussed.
Hierarchical Clustering Algorithms for Document Datasets
The experimental evaluation shows that, contrary to the common belief, partitional algorithms always lead to better solutions than agglomerative algorithms; making them ideal for clustering large document collections due to not only their relatively low computational requirements, but also higher clustering quality.
A Comparison of Document Clustering Techniques
This paper compares the two main approaches to document clustering, agglomerative hierarchical clustering and K-means, and indicates that the bisecting K-MEans technique is better than the standard K-Means approach and as good or better as the hierarchical approaches that were tested for a variety of cluster evaluation metrics.
A Survey of Text Clustering Algorithms
This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages.