• Publications
  • Influence
Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose
Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.
A Survey on Bias and Fairness in Machine Learning
This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.
Feature Selection
This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based.
A new approach to bot detection: Striking the balance between precision and recall
A model which increases the recall in detecting bots, allowing a researcher to delete more bots is proposed, and it is shown that the detection algorithm removes more bots from a dataset than current approaches.
Advancing Feature Selection Research − ASU Feature Selection Repository
A feature selection repository is presented, which is designed to collect the most popular algorithms that have been developed in the feature selection research to serve as a platform for facilitating their application, comparison and joint study.
Twitter Data Analytics
This brief is designed to provide researchers, practitioners, project managers, as well as graduate students with an entry point to jump start their Twitter endeavors and serves as a convenient reference for readers seasoned in Twitter data analysis.
Tampering with Twitter’s Sample API
It is demonstrated that, due to the nature of Twitter’s sampling mechanism, it is possible to deliberately influence these samples, the extent and content of any topic, and consequently to manipulate the analyses of researchers, journalists, as well as market and political analysts trusting these data sources.
Advancing feature selection research
Feature selection is an essential step in successful data mining applications, which can effectively reduce data dimensionality by removing the irrelevant (and the redundant) features.
Misinformation in Social Media: Definition, Manipulation, and Detection
A definition for misinformation in social media is introduced and the difference between misinformation detection and classic supervised learning is examined, and characteristics of individual methods of misinformation detection are explained, and commentary on their advantages and pitfalls are provided.
Finding Eyewitness Tweets During Crises
Disaster response agencies incorporate social media as a source of fast-breaking information to understand the needs of people affected by the many crises that occur around the world. These agencies