Corpus ID: 204800577

Context-Driven Data Mining through Bias Removal and Data Incompleteness Mitigation

  title={Context-Driven Data Mining through Bias Removal and Data Incompleteness Mitigation},
  author={Feras A. Batarseh and Ajay Kulkarni},
The results of data mining endeavors are majorly driven by data quality. Throughout these deployments, serious show-stopper problems are still unresolved, such as: data collection ambiguities, data imbalance, hidden biases in data, the lack of domain information, and data incompleteness. This paper is based on the premise that context can aid in mitigating these issues. In a traditional data science lifecycle, context is not considered. Context-driven Data Science Lifecycle (C-DSL); the main… Expand
Public Policymaking for International Agricultural Trade using Association Rules and Ensemble Machine Learning
Novel methods that predict and associate food and agricultural commodities traded internationally and Ensemble Machine Learning methods are developed to provide improved agricultural trade predictions, outlier events’ implications, and quantitative pointers to policy makers are presented. Expand
Clustering by periodontitis-associated factors - a novel application to NHANES data.
Clustering of NHANES demographic, systemic health, and socioeconomic data effectively identifies characteristics that are statistically significantly related toperiodontitis status and hence detects subpopulations at high risk for periodontitis without costly clinical examinations. Expand


Data Cleaning: Overview and Emerging Challenges
This work presents a taxonomy of the data cleaning literature and discusses recent work that casts such approaches into a statistical estimation framework including: using Machine Learning to improve the efficiency and accuracy of data cleaning and considering the effects of data cleaned on statistical analysis. Expand
Data preparation for data mining
The importance of data preparation in data analysis is shown, some research achievements in the area of data preparedness are introduced, and some future directions of research and development are suggested. Expand
Techniques to deal with missing data
  • J. Sessa, Dabeeruddin Syed
  • Engineering
  • 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA)
  • 2016
Data is available to us in humongous amounts in the real world, but none of it is of practical use if not converted to useful information. However, the knowledge discovery is hindered because theExpand
Missing data analysis: making it work in the real world.
  • J. Graham
  • Psychology, Medicine
  • Annual review of psychology
  • 2009
This review presents a practical summary of the missing data literature, including a sketch of missing data theory and descriptions of normal-model multiple imputation (MI) and maximum likelihood methods, and strategies for reducing attrition bias. Expand
Context-Driven Testing on the Cloud
This chapter introduces a context scheme deployed within a software engineering lifecycle through a testing method that utilizes a context-based philosophy for testing systems implemented on the cloud. Expand
The Management of Context-Sensitive Features: A Review of Strategies
Five heuristic strategies for handling context- sensitive features in supervised machine learning from examples are reviewed and it appears that the framework includes all of the techniques that can be found in the published literature on context-sensitive learning. Expand
Exploratory Data Mining and Data Cleaning
is no implied obligation to purchase the software in the future, but that a fee of about 300 euros will likely be charged for upcoming updates. The authors identify their target audience asExpand
Context-Assisted Test Cases Reduction for Cloud Validation
This paper introduces a validation method called Context-Assessment Test Case Reduction CATCR, which reduces test cases based on the context of the validation process for systems that are deployed on the cloud. Expand
LOF: identifying density-based local outliers
This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties. Expand
Control-Sensitive Feature Selection for Lazy Learners
Experiments show that RC almost always improves accuracy with respect to FSS and BSS, and a study using artificial domains confirms the hypothesis that this difference in performance is due to RC's context sensitivity, and suggests conditions where this sensitivity will and will not be an advantage. Expand