• Publications
  • Influence
Feature Selection for Classification
This survey identifies the future research areas in feature selection, introduces newcomers to this field, and paves the way for practitioners who search for suitable methods for solving domain-specific real-world applications.
Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose
Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.
Exploring temporal effects for location recommendation on location-based social networks
A novel location recommendation framework is introduced, based on the temporal properties of user movement observed from a real-world LBSN dataset, which exhibits the significance of temporal patterns in explaining user behavior, and demonstrates their power to improve location recommendation performance.
A Probabilistic Approach to Feature Selection - A Filter Solution
The theoretic analysis and the experimental study show that the proposed proba bilistic approach is simple to implement and guaranteed to be the optimal if resources permit.
Consistency-based search in feature selection
  • M. Dash, Huan Liu
  • Mathematics, Computer Science
    Artif. Intell.
  • 1 December 2003
An empirical study is conducted to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.
Identifying the influential bloggers in a community
The challenges of identifying influential bloggers are discussed, what constitutes influential bloggers is investigated, a preliminary model attempting to quantify an influential blogger is presented, and the way for building a robust model that allows for finding various types of the influentials is paved.
Connecting users across social media sites: a behavioral-modeling approach
This study formally defines the cross-media user identification problem, introduces a methodology (MOBIUS) for finding a mapping among identities of individuals across social media sites, and shows that MOBIUS is effective in identifying users across socialMedia sites.
Chi2: feature selection and discretization of numeric attributes
  • Huan Liu, R. Setiono
  • Computer Science
    Proceedings of 7th IEEE International Conference…
  • 5 November 1995
Chi2 is a simple and general algorithm that uses the /spl chi//sup 2/ statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data, and achieves feature selection via discretization.
Social Media Mining: An Introduction
Social Media Mining introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining.
gSCorr: modeling geo-social correlations for new check-ins on location-based social networks
This paper proposes a geo-social correlation model to capture social correlations on LBSNs considering social networks and geographical distance, and demonstrates that this approach properly models the social correlations of a user's new check-ins by considering various correlation strengths and correlation measures.