Improving classification accuracy by identifying and removing instances that should be misclassified

@article{Smith2011ImprovingCA,
  title={Improving classification accuracy by identifying and removing instances that should be misclassified},
  author={M. Smith and T. Martinez},
  journal={The 2011 International Joint Conference on Neural Networks},
  year={2011},
  pages={2690-2697}
}
  • M. Smith, T. Martinez
  • Published 2011
  • Computer Science
  • The 2011 International Joint Conference on Neural Networks
  • Appropriately handling noise and outliers is an important issue in data mining. In this paper we examine how noise and outliers are handled by learning algorithms. We introduce a filtering method called PRISM that identifies and removes instances that should be misclassified. We refer to the set of removed instances as ISMs (instances that should be misclassified). We examine PRISM and compare it against 3 existing outlier detection methods and 1 noise reduction technique on 48 data sets using… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    An instance level analysis of data complexity
    • 120
    • PDF
    Cluster Validation Measures for Label Noise Filtering
    • 2
    Using Classifier diversity to handle label noise
    • 1
    Becoming More Robust to Label Noise with Classifier Diversity
    • 2
    • PDF
    Classification in the Presence of Label Noise: A Survey
    • 717
    • PDF

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 32 REFERENCES
    Identifying Mislabeled Training Data
    • 708
    • PDF
    Robust Decision Trees: Removing Outliers from Databases
    • 244
    • PDF
    Outlier detection by active learning
    • 270
    A Survey of Outlier Detection Methodologies
    • 2,531
    • PDF
    LOF: identifying density-based local outliers
    • 3,133
    • Highly Influential
    • PDF
    Mining class outliers: concepts, algorithms and applications in CRM
    • 86
    • PDF
    Efficient algorithms for mining outliers from large data sets
    • 1,319
    • PDF
    A boosting method to detect noisy data
    • 7
    • PDF
    Outlier Detection Integrating Semantic Knowledge
    • 52