Gender prediction on a real life blog data set using LSI and KNN

Abstract

Gender prediction on social media data set is usually tackled as a text classification problem and can be solved using machine learning methods such as K-nearest neighbor algorithm (KNN). However, KNN is computationally costly due to its lazy learning pattern; it does not perform well when the dimension of feature space is high. Dimension reduction methods are thus introduced and integrated into KNN to save the computation time. In this paper we proposed an approach which combines the Latent Semantic Indexing (LSI) method to KNN to predict the gender based on a real life collection of posts on actual blog pages. Its effectiveness in processing large scale and high dimensional data is demonstrated by experimental results.

7 Figures and Tables

Cite this paper

@article{Chen2017GenderPO, title={Gender prediction on a real life blog data set using LSI and KNN}, author={Jianle Chen and Tianqi Xiao and Jie Sheng and Ankur Teredesai}, journal={2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC)}, year={2017}, pages={1-6} }