• Corpus ID: 26911869

Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection

@article{SosaCabrera2017UnderstandingAV,
  title={Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection},
  author={Gustavo Sosa-Cabrera and Miguel Garc{\'i}a-Torres and Santiago G{\'o}mez and Christian E. Schaerer and Federico Divina},
  journal={ArXiv},
  year={2017},
  volume={abs/1709.08730}
}
In this paper, we analyze the behavior of the multivariate symmetric uncertainty (MSU) measure through the use of statistical simulation techniques under various mixes of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU. We discovered a condition that preserves good quality in the MSU under different combinations of these three factors, providing a new useful criterion to help… 
4 Citations
Research on Hybrid Feature Selection Method Based on Iterative Approximation Markov Blanket
TLDR
Comparative experiments using traditional Chinese medicine material basic experimental data and UCI's multiple public datasets show that the new method has a better advantage to select a small number of highly explanatory features, compared with Lasso, XGBoost, and the classic approximate Markov blanket method.
Filtre Tabanlı Nitelik Seçimi ve Topluluk Öğrenme Yaklaşımlarıyla Borsa İstanbul Enerji Endeksi Yön Tahmini
In the study, daily price change directions of XKMYA (energy), one of the important indexes of Borsa Istanbul, were predicted by using financial news published on financial portal website. In the
Discovering Association Rules Using R. A Case Study on Retail's Database
TLDR
The research’s main objective is to apply Data Mining techniques for the discovery of association rules from purely commercial transactional data, taking as a study period 10-year in a household appliances and furniture retail entity.

References

SHOWING 1-10 OF 10 REFERENCES
On Biases in Estimating Multi-Valued Attributes
TLDR
The basics of eleven measures for estimating the quality of the multivalued attributes are analyzed and a new function based on the MDL principle whose value slightly decreases with the increasing number of attributes values is introduced.
Technical Note: Bias in Information-Based Measures in Decision Tree Induction
TLDR
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees and it is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
Feature Selection Using Approximate Multivariate Markov Blankets
TLDR
This paper introduces a multivariate approach to the AMb definition, called Approximate Multivariate Markov blanket (AMMb), which takes into account interactions among different features of a given subset, and considers a backward strategy similar to the Fast Correlation Based Filter (FCBF), which incorporates the proposal.
Correlation-based Feature Selection for Machine Learning
TLDR
This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Information Theoretical Analysis of Multivariate Correlation
TLDR
The present paper gives various theorems, according to which Ctot(λ) can be decomposed in terms of the partial correlations existing in subsets of λ, and of quantities derivable therefrom.
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
TLDR
This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals.
C4.5: Programs for Machine Learning
TLDR
A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Multivariate information transmission
  • W. J. McGill
  • Computer Science
    Trans. IRE Prof. Group Inf. Theory
  • 1954
TLDR
It is shown that sample transmitted information provides a simple method for measuring and testing association in multidimensional contingency tables and relations with analysis of variance are pointed out.
C4.5: Programs for Machine Learning (書評)
Ross . ” C 4 . 5 : Programs for machine learning
  • The Morgan Kaufmann Series in Machine Learning