• Corpus ID: 246015729

Dialog Intent Induction via Density-based Deep Clustering Ensemble

  title={Dialog Intent Induction via Density-based Deep Clustering Ensemble},
  author={Jiashu Pu and Guandan Chen and Yongzhu Chang and Xiao-Xi Mao},
Existing task-oriented chatbots heavily rely on spoken language understanding (SLU) systems to determine a user’s utterance’s intent and other key information for fulfilling specific tasks. In real-life applications, it is crucial to occasionally induce novel dialog intents from the conversation logs to improve the user experience. In this paper, we propose the Density-based Deep Clustering Ensemble (DDCE) method for dialog intent induction. Compared to existing K-means based methods, our… 


Dialog Intent Induction with Deep Multi-View Clustering
This work introduces the dialog intent induction task and proposes alternating-view k-means (AV-KMEANS) for joint multi-view learning and clustering analysis, which can induce better dialog intent clusters than state-of-the-art unsupervised representation learning methods and standardmulti-view clustering approaches.
Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement
Constrained deep adaptive clustering with cluster refinement (CDAC+) is proposed, an end-to-end clustering method that can naturally incorporate pairwise constraints as prior knowledge to guide the clustering process.
Intent Mining from past conversations for Conversational Agent
This paper presents an intent discovery framework that can mine a vast amount of conversational logs and to generate labeled data sets for training intent models, and introduced an extension to the DBSCAN algorithm and a density-based clustering algorithm ITER-DBSCAN for unbalanced data clustering.
Benchmarking Natural Language Understanding Services for building Conversational Agents
The results show that on Intent classification Watson significantly outperforms the other platforms, namely, Dialogflow, LUIS and Rasa; though these also perform well; and Interestingly, on Entity Type recognition, Watson performs significantly worse due to its low Precision.
A Self-Training Approach for Short Text Clustering
The method is proposed, which learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network.
Semi-supervised Clustering for Short Text via Deep Representation Learning
A novel objective is designed to combine the representation learning process and the k-means clustering process together, and optimize the objective with both labeled data and unlabeled data iteratively until convergence through three steps.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
A new dataset is introduced that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents, posing a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
On the Sentence Embeddings from Pre-trained Language Models
This paper proposes to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective and achieves significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks.