Large‐scale data mining using genetics‐based machine learning

@article{Bacardit2013LargescaleDM,
  title={Large‐scale data mining using genetics‐based machine learning},
  author={Jaume Bacardit and Xavier Llor{\`a}},
  journal={Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
  year={2013},
  volume={3}
}
  • J. Bacardit, Xavier Llorà
  • Published 2013
  • Computer Science
  • Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
In the last decade, genetics‐based machine learning methods have shown their competence in large‐scale data mining tasks because of the scalability capacity that these techniques have demonstrated. This capacity goes beyond the innate massive parallelism of evolutionary computation methods by the proposal of a variety of mechanisms specifically tailored for machine learning tasks, including knowledge representations that exploit regularities in the datasets, hardware accelerations or data… 

Topics from this paper

Machine learning applications in genetics and genomics
TLDR
An overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data is provided.
Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets
TLDR
A new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task to improve the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained.
Machine learning in genetics and genomics
The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning
Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example
TLDR
This article describes several strategies to tightly integrate knowledge extraction and data mining in order to create a new class of biodata mining algorithms that can natively embrace the complexity of biological data, efficiently generate accurate information in the form of classification/regression models, and extract valuable new knowledge.
Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach
TLDR
A feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets, improving both the classification accuracy and its runtime when dealing with big data problems.
ExSTraCS 2.0: description and evaluation of a scalable learning classifier system
TLDR
Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets.
HARD DATA ANALYTICS PROBLEMS MAKE FOR BETTER DATA ANALYSIS ALGORITHMS: as an Example
TLDR
This article describes several strategies to tightly integrate knowledge extraction and data mining in order to create a new class of biodata mining algorithms that can natively embrace the complexity of biological data, efficiently generate accurate information in the form of classification/regression models, and extract valuable new knowledge.
Application of Parallel Distributed Implementation to Multiobjective Fuzzy Genetics-Based Machine Learning
TLDR
Through computational experiments on large data sets, the effects of parallel distributed implementation on the search performance of the multiobjective fuzzy genetics-based machine learning and its computation time are examined.
A multi-core parallelization strategy for statistical significance testing in learning classifier systems
TLDR
The benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation are examined.
Evolutionary undersampling for imbalanced big data classification
TLDR
A parallel model to enable evolutionary undersampling methods to deal with large-scale problems is designed that relies on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 159 REFERENCES
Knowledge-independent data mining with fine-grained parallel evolutionary algorithms
TLDR
It is demonstrated that EA can provide a competitive general purpose data mining scheme for classification tasks without constraining the knowledge representation and that it can be achieved reducing the amount of time required using the inherent parallel processing nature of EA.
Genetic-based machine learning systems are competitive for pattern recognition
TLDR
The state of the art in GBML is reviewed, some of the best representatives of different families are selected, and the accuracy and the interpretability of their models are compared, which can be used as recommendation guidelines on which systems should be employed depending on whether the user prefers to maximize the accuracy or theinterpretability of the models.
Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study
TLDR
This paper has a double aim: to present a taxonomy of the genetics-based machine learning approaches for rule induction, and to develop an empirical analysis both for standard classification and for classification with imbalanced data sets.
Improving the scalability of rule-based evolutionary learning
TLDR
A new representation motivated by observations that Bioinformatics and Systems Biology often give rise to very large-scale datasets that are noisy, ambiguous and usually described by a large number of attributes is presented, which is up to 2–3 times faster than state-of-the-art evolutionary learning representations designed specifically for efficiency purposes.
Genetics-Based Machine Learning
  • T. Kovacs
  • Computer Science
    Handbook of Natural Computing
  • 2012
TLDR
This is a survey of the field of Genetics-based Machine Learning: the application of evolutionary algorithms to machine learning, with emphasis on their evolutionary aspects.
GP ensembles for large-scale data classification
TLDR
Experiments on several data sets show that, by using a training set of reduced size, better classification accuracy can be obtained, but at a much lower computational cost.
XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining
This paper compares the learning performance, in terms of prediction accuracy, of two genetic-based learning systems, XCS and GALE, with six well-known learning algorithms, coming from instance based
Genetic programming in classifying large-scale data: an ensemble method
TLDR
Genetic programming as a base classifier algorithm in building ensembles in the context of large-scale data classification was demonstrated to significantly outperform its counterparts built upon base classifiers that were trained with decision tree and logistic regression.
Evolutionary computation in data mining
TLDR
This paper presents an Evolutionary Modularized Data Mining Mechanism for Financial Distress Forecasts and strategies for Scaling Up Evolutionary Instance Reduction Algorithms for Data Mining.
Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets[C][W][OA]
TLDR
The utility of coprediction is demonstrated as a powerful analytical tool using publicly available microarray data generated exclusively from Arabidopsis thaliana seeds to compute a functional gene interaction network, termed Seed Co-Prediction Network (SCoPNet), which predicts functional associations between genes acting in the same developmental and signal transduction pathways irrespective of the similarity in their respective gene expression patterns.
...
1
2
3
4
5
...