Parallel MCNN (pMCNN) with Application to Prototype Selection on Large and Streaming Data

  title={Parallel MCNN (pMCNN) with Application to Prototype Selection on Large and Streaming Data},
  author={V. Susheela Devi and Lakhpat Meena},
  journal={Journal of Artificial Intelligence and Soft Computing Research},
  pages={155 - 169}
  • V. Devi, Lakhpat Meena
  • Published 1 July 2017
  • Computer Science
  • Journal of Artificial Intelligence and Soft Computing Research
Abstract The Modified Condensed Nearest Neighbour (MCNN) algorithm for prototype selection is order-independent, unlike the Condensed Nearest Neighbour (CNN) algorithm. Though MCNN gives better performance, the time requirement is much higher than for CNN. To mitigate this, we propose a distributed approach called Parallel MCNN (pMCNN) which cuts down the time drastically while maintaining good performance. We have proposed two incremental algorithms using MCNN to carry out prototype selection… 

Figures and Tables from this paper

On Ensemble Components Selection in Data Streams Scenario with Gradual Concept-Drift
The algorithm proposed in this paper is an enhanced version of the ASE (Automatically Sized Ensemble) algorithm which guarantees that a new component will be added to the ensemble only if it increases the accuracy not only for the current data chunk but also for the whole data stream.
An Instance Selection Algorithm Based on ReliefF
A new instance selection algorithm based on ReliefF, which is a feature selection algorithm, which can reduce data at a specified rate and have the ability to run parallel on the instances is proposed.
On Handling Missing Values in Data Stream Mining Algorithms Based on the Restricted Boltzmann Machine
This paper proposes two modifications of the RBM learning algorithms to make them able to handle missing values, and introduces dimension-dependent sizes of minibatches in the stochastic gradient descent method.
On the Parzen Kernel-Based Probability Density Function Learning Procedures Over Time-Varying Streaming Data With Applications to Pattern Classification
A recursive variant of the Parzen kernel density estimator (KDE) is proposed to track changes of dynamic density over data streams in a nonstationary environment and it is shown how to choose the bandwidth and learning rate of a recursive KDE in order to ensure weak and strong convergence.
A New Approach to Detection of Changes in Multidimensional Patterns
A new approach for abrupt changes detection based on the Parzen kernel estimation of the partial derivatives of the multivariate regression functions in presence of probabilistic noise is proposed.
On the Global Convergence of the Parzen-Based Generalized Regression Neural Networks Applied to Streaming Data
The mean integrated squared error of the regression estimate is shown to converge under several conditions and results illustrate asymptotic properties of the Parzen-type recursive algorithm and its convergence for a wide spectrum of a time-varying noise.
Estimation of Probability Density Function, Differential Entropy and Other Relative Quantities for Data Streams with Concept Drift
Estimators of the Cauchy-Schwarz divergence and the probability density function divergence are proposed, which are used to measure the differences between two probability density functions.
Parallel Processing of Color Digital Images for Linguistic Description of Their Content
This paper presents different aspects of parallelization of a problem of processing color digital images in order to generate linguistic description of their content. A parallel architecture of an
On the Hermite Series-Based Generalized Regression Neural Networks for Stream Data Mining
The mathematically justified stream data mining algorithm for solving regression problems is developed, based on the Hermite expansions of drifting regression functions, and the global convergence is proved both in probability and with probability one.
Parallel Processing of Images Represented by Linguistic Description in Databases
The problem of image retrieval and classification is presented by use of the linguistic description represented in databases, and the rough granulation, by using the CIE chromaticity color model and granulation approach.


Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets
This work presents the parallel fast condensed nearest neighbor (PFCNN) rule, a distributed method for computing a consistent subset of a very large data set for the nearest neighbor classification rule, and is the first distributed algorithm for Computing a training set consistent subset for the closest neighbor rule.
Fast condensed nearest neighbor rule
This work presents a novel algorithm for computing a training set consistent subset for the nearest neighbor decision rule, and compares it with state of the art competence preservation algorithms on large multidimensional training sets, showing that it outperforms existing methods in terms of learning speed and learning scaling behavior.
Efficient instance-based learning on data streams
This paper considers the problem of classification on data streams and develops an instance-based learning algorithm for that purpose and suggests that this algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives.
Prototype Selection Algorithms for kNN Classifier: A Survey
This paper provides a survey of the prototype selection method’s categorization/taxonomy that could be considered relevant and different properties could be observed in the definition of these methods, but no formal categorization has been established yet.
The Condensed Nearest Neighbor Rule
The CNN rule is suggested as a rule which retains the basic approach of the NN rule without imposing such stringent storage requirements, and the notion of a consistent subset of a sample set is defined.
High performance parallel evolutionary algorithm model based on MapReduce framework
In order to justify the effectiveness of the MR-PEA model, a parallel gene expression programming based on MapReduce MR-GEP used to solve symbolic regression is proposed.
A Nearest Prototype Selection Algorithm Using Multi-objective Optimization and Partition
  • Juan Li, Yuping Wang
  • Computer Science
    2013 Ninth International Conference on Computational Intelligence and Security
  • 2013
The simulation results indicate that the proposed algorithm can obtain smaller reduction ratio and higher classification efficiency, or at least comparable to those of some existing compared algorithms, which illustrates thatThe proposed algorithm is an expedient method in design nearest neighbor classifiers.
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
Tests performed on both synthetic and real-life data indicate that the new classifier outperforms existing algorithms for data streams in terms of accuracy and computational costs.
A scalable parallel implementation of evolutionary algorithms for multi-objective optimization on GPUs
This paper proposes a parallel GPU based implementation of NSGA-II with major focus on non-dominated sorting and can be easily coupled with the original form of NSga-II to solve real world problems using large populations.
An incremental prototype set building technique