Parallel MCNN (pMCNN) with Application to Prototype Selection on Large and Streaming Data

@article{Devi2017ParallelM,
  title={Parallel MCNN (pMCNN) with Application to Prototype Selection on Large and Streaming Data},
  author={V. Susheela Devi and Lakhpat Meena},
  journal={Journal of Artificial Intelligence and Soft Computing Research},
  year={2017},
  volume={7},
  pages={155 - 169}
}
  • V. Devi, Lakhpat Meena
  • Published 1 July 2017
  • Computer Science
  • Journal of Artificial Intelligence and Soft Computing Research
Abstract The Modified Condensed Nearest Neighbour (MCNN) algorithm for prototype selection is order-independent, unlike the Condensed Nearest Neighbour (CNN) algorithm. Though MCNN gives better performance, the time requirement is much higher than for CNN. To mitigate this, we propose a distributed approach called Parallel MCNN (pMCNN) which cuts down the time drastically while maintaining good performance. We have proposed two incremental algorithms using MCNN to carry out prototype selection… 

Figures and Tables from this paper

On Ensemble Components Selection in Data Streams Scenario with Gradual Concept-Drift
TLDR
The algorithm proposed in this paper is an enhanced version of the ASE (Automatically Sized Ensemble) algorithm which guarantees that a new component will be added to the ensemble only if it increases the accuracy not only for the current data chunk but also for the whole data stream.
Resource-Aware Data Stream Mining Using the Restricted Boltzmann Machine
TLDR
This paper considers the problem of data stream mining with an application of the Restricted Boltzmann Machine (RBM), and tests three strategies for dealing with a buffer overflow in the case of high-speed data streams: load shedding, minibatch resizing, and controlling the number of Gibbs steps in the learning algorithm.
An Instance Selection Algorithm Based on ReliefF
TLDR
A new instance selection algorithm based on ReliefF, which is a feature selection algorithm, which can reduce data at a specified rate and have the ability to run parallel on the instances is proposed.
Online GRNN-Based Ensembles for Regression on Evolving Data Streams
TLDR
A novel procedure for regression analysis in the case of non-stationary data streams is presented and it is demonstrated that the proposed algorithm allows for tracking different types nonstationarities and increases accuracy with respect to a single estimator.
On Handling Missing Values in Data Stream Mining Algorithms Based on the Restricted Boltzmann Machine
TLDR
This paper proposes two modifications of the RBM learning algorithms to make them able to handle missing values, and introduces dimension-dependent sizes of minibatches in the stochastic gradient descent method.
On the Parzen Kernel-Based Probability Density Function Learning Procedures Over Time-Varying Streaming Data With Applications to Pattern Classification
TLDR
A recursive variant of the Parzen kernel density estimator (KDE) is proposed to track changes of dynamic density over data streams in a nonstationary environment and it is shown how to choose the bandwidth and learning rate of a recursive KDE in order to ensure weak and strong convergence.
Concept Drift Detection in Streams of Labelled Data Using the Restricted Boltzmann Machine
In this paper, the method of concept drift detection in time-varying data stream mining is considered. The Restricted Boltzmann Machine (RBM) is proposed to be applied as a drift detector. The RBMs
A New Approach to Detection of Changes in Multidimensional Patterns
TLDR
A new approach for abrupt changes detection based on the Parzen kernel estimation of the partial derivatives of the multivariate regression functions in presence of probabilistic noise is proposed.
On the Global Convergence of the Parzen-Based Generalized Regression Neural Networks Applied to Streaming Data
TLDR
The mean integrated squared error of the regression estimate is shown to converge under several conditions and results illustrate asymptotic properties of the Parzen-type recursive algorithm and its convergence for a wide spectrum of a time-varying noise.
Improvement of the Simplified Silhouette Validity Index
TLDR
Modification of the Simplified Silhouette index is proposed, based on using an additional component, which improves clusters validity assessment.
...
...

References

SHOWING 1-10 OF 43 REFERENCES
Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets
TLDR
This work presents the parallel fast condensed nearest neighbor (PFCNN) rule, a distributed method for computing a consistent subset of a very large data set for the nearest neighbor classification rule, and is the first distributed algorithm for Computing a training set consistent subset for the closest neighbor rule.
Fast condensed nearest neighbor rule
TLDR
This work presents a novel algorithm for computing a training set consistent subset for the nearest neighbor decision rule, and compares it with state of the art competence preservation algorithms on large multidimensional training sets, showing that it outperforms existing methods in terms of learning speed and learning scaling behavior.
Efficient instance-based learning on data streams
TLDR
This paper considers the problem of classification on data streams and develops an instance-based learning algorithm for that purpose and suggests that this algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives.
Prototype Selection Algorithms for kNN Classifier: A Survey
TLDR
This paper provides a survey of the prototype selection method’s categorization/taxonomy that could be considered relevant and different properties could be observed in the definition of these methods, but no formal categorization has been established yet.
The Condensed Nearest Neighbor Rule
TLDR
The CNN rule is suggested as a rule which retains the basic approach of the NN rule without imposing such stringent storage requirements, and the notion of a consistent subset of a sample set is defined.
High performance parallel evolutionary algorithm model based on MapReduce framework
TLDR
In order to justify the effectiveness of the MR-PEA model, a parallel gene expression programming based on MapReduce MR-GEP used to solve symbolic regression is proposed.
A Nearest Prototype Selection Algorithm Using Multi-objective Optimization and Partition
  • Juan Li, Yuping Wang
  • Computer Science
    2013 Ninth International Conference on Computational Intelligence and Security
  • 2013
TLDR
The simulation results indicate that the proposed algorithm can obtain smaller reduction ratio and higher classification efficiency, or at least comparable to those of some existing compared algorithms, which illustrates thatThe proposed algorithm is an expedient method in design nearest neighbor classifiers.
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
TLDR
Tests performed on both synthetic and real-life data indicate that the new classifier outperforms existing algorithms for data streams in terms of accuracy and computational costs.
A scalable parallel implementation of evolutionary algorithms for multi-objective optimization on GPUs
TLDR
This paper proposes a parallel GPU based implementation of NSGA-II with major focus on non-dominated sorting and can be easily coupled with the original form of NSga-II to solve real world problems using large populations.
...
...