CoSelect: Feature Selection with Instance Selection for Social Media Data

Abstract

Feature selection is widely used in preparing highdimensional data for effective data mining. Attributevalue data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media datasets demonstrate the effectiveness of our proposed framework and its potential in mining social media data.

DOI: 10.1137/1.9781611972832.77

Extracted Key Phrases

6 Figures and Tables

01020201520162017
Citations per Year

Citation Velocity: 4

Averaging 4 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@inproceedings{Tang2013CoSelectFS, title={CoSelect: Feature Selection with Instance Selection for Social Media Data}, author={Jiliang Tang and Huan Liu}, booktitle={SDM}, year={2013} }