Feature Selection for Social Media Data

@article{Tang2014FeatureSF,
  title={Feature Selection for Social Media Data},
  author={Jiliang Tang and Huan Liu},
  journal={ACM Trans. Knowl. Discov. Data},
  year={2014},
  volume={8},
  pages={19:1-19:27}
}
Feature selection is widely used in preparing high-dimensional data for effective data mining. [] Key Result We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection.
A Social-aware online short-text feature selection technique for social media
Short-text feature construction and selection in social media data: a survey
TLDR
This paper surveys feature selection techniques for dealing with short texts in both offline and online settings, and open issues and research opportunities for performing online feature selection over social media data are discussed.
The good, the bad, and the ugly: uncovering novel research opportunities in social media mining
TLDR
In the endeavor of employing the good to tame the bad with the help of the ugly, this work deepen the understanding of ever growing and continuously evolving data and create innovative solutions with interdisciplinary and collaborative research of data science.
Investigating Classification Techniques with Feature Selection For Intention Mining From Twitter Feed
TLDR
This paper investigates the problem of selecting features that affect extracting user's intention from Twitter feeds based on text mining techniques, and presents two techniques of feature selection followed by classification.
Computing Distrust in Social Media
TLDR
Trust plays a crucial role in helping online users collect relevant and reliable information, and has been proven to be an effective way to mitigate information overload and credibility problems.
Recent advances in feature selection and its applications
TLDR
This review paper presents a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection, as well as some representative applications of feature selection.
Feature Selection: Multi-source and Multi-view Data Limitations, Capabilities and Potentials
TLDR
Several, emerging FS techniques for multi-source and multi-view data are reviewed, to underscore uses and limitations of these heterogeneous methods concurrently, by summarising their capabilities and potentials to inform key areas of future research, especially in numerous applications.
Mining Public Opinion on Ride-Hailing Service Providers using Aspect-Based Sentiment Analysis
TLDR
This study analyzes customers’ opinions on Twitter of three ride-hailing service providers and combines the text mining approach with aspect-based sentiment analysis to identify topics in customer opinions and their sentiments to improve customer satisfaction and loyalty.
Feature Selection for Pattern Recognition: Upcoming Challenges
TLDR
The present chapter exposes the gap in feature selection research to handle Chronologically linked data feature selection, as well as giving suggestions of how to perform or pursue an approach to chronologically linkedData feature selection.
...
...

References

SHOWING 1-10 OF 51 REFERENCES
Integrating Social Media Data for Community Detection
TLDR
This work presents a joint optimization framework to integrate multiple data sources for community detection and elaborates the need for and challenges of multi-source integration of heterogeneous data types, and provides a principled way ofMulti-source community detection.
Discovering Overlapping Groups in Social Media
TLDR
A novel co-clustering framework is proposed, which takes advantage of networking information between users and tags in social media, to discover these overlapping communities.
Scalable learning of collective behavior based on sparse social dimensions
TLDR
This work proposes an edge-centric clustering scheme to extract sparse social dimensions that can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other non-scalable methods.
Exploring Social-Historical Ties on Location-Based Social Networks
TLDR
A social-historical model is proposed to explore user’s check-in behavior on location-based social networks and shows how social and historical ties can help location prediction.
Using Transactional Information to Predict Link Strength in Online Social Networks
TLDR
This work develops a supervised learning approach to predict link strength from transactional information as a link prediction task and compares the utility of attribute-based, topological, and transactional features.
MetaFac: community discovery via relational hypergraph factorization
TLDR
The proposed MetaFac (MetaGraph Factorization), a framework that extracts community structures from various social contexts and interactions, outperform baseline methods by an order of magnitude and is able to extract meaningful communities based on the social media contexts.
Multi-Source Feature Selection via Geometry-Dependent Covariance Analysis
TLDR
This work investigates how the information contained in multiple data sources can be employed to effectively derive intrinsic relationships that can help select more meaningful (or domain relevant) features in multi-source feature selection.
Laplacian Score for Feature Selection
TLDR
This paper proposes a "filter" method for feature selection which is independent of any learning algorithm, based on the observation that, in many real world classification problems, data from the same class are often close to each other.
Label and Link Prediction in Relational Data
TLDR
This work presents a flexible framework that builds on (conditional) Markov networks and successfully addresses both tasks by capturing complex dependencies in the data, and achieves significantly better performance than flat classification.
Modeling relationship strength in online social networks
TLDR
This work develops an unsupervised model to estimate relationship strength from interaction activity and user similarity and evaluates it on real-world data from Facebook and LinkedIn, showing that the estimated link weights result in higher autocorrelation and lead to improved classification accuracy.
...
...