Latent Space Learning for Enhanced Short Text Classification

Abstract

There has been recently a growing interest in short text classification and analysis. However, conventional machine learning and text mining algorithms are not suitable for analyzing short texts due to their shortness and sparsity. In this paper, we put forward a novel representation model for short text classification. Our proposed model basically learns a compact latent space for modeling short text based on semantic similarity among words in training corpus. We first capture semantic relationship between words by using Word2Vec. Sparse autoencoder will be then applied to learn a compact latent space for representation of short texts. With the learned space, reliable features can be estimated based on least square technique. We conduct experiments on the two classification tasks: sentiment text classification and news title classification to evaluate the proposed method. Experimental results on two real-world datasets demonstrate that our proposed method produces more stable features that enhance short-text classification than state-of-the-art latent feature representations.

DOI: 10.1145/3023924.3023933

7 Figures and Tables

Cite this paper

@inproceedings{Pipanmaekaporn2016LatentSL, title={Latent Space Learning for Enhanced Short Text Classification}, author={Luepol Pipanmaekaporn and Suwatchai Kamonsantoroj}, booktitle={ICCIS '16}, year={2016} }