PMLB: a large benchmark suite for machine learning evaluation and comparison

  title={PMLB: a large benchmark suite for machine learning evaluation and comparison},
  author={R. S. Olson and W. L. Cava and P. Orzechowski and R. Urbanowicz and J. Moore},
  journal={BioData Mining},
  • R. S. Olson, W. L. Cava, +2 authors J. Moore
  • Published 2017
  • Computer Science, Medicine
  • BioData Mining
  • BackgroundThe selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists… CONTINUE READING
    133 Citations

    Figures and Topics from this paper

    Benchmark AFLOW Data Sets for Machine Learning
    • 3
    OpenML Benchmarking Suites
    • 21
    Efficient and Robust Model Benchmarks with Item Response Theory and Adaptive Testing
    • PDF
    Where are we now?: a large benchmark study of recent symbolic regression methods
    • 39
    • PDF
    Evolutionary dataset optimisation: learning algorithm quality through evolution
    • 3
    • PDF
    OpenML Benchmarking Suites and the OpenML100
    • 34
    • PDF


    A Comprehensive Dataset for Evaluating Approaches of Various Meta-learning Tasks
    • 19
    • PDF
    ExSTraCS 2.0: description and evaluation of a scalable learning classifier system
    • 58
    Machine Learning Benchmarks and Random Forest Regression
    • 327
    Towards UCI+: A mindful repository design
    • 32
    • Highly Influential
    Instance spaces for machine learning classification
    • 46
    • PDF
    An empirical comparison of supervised learning algorithms
    • 1,810
    • PDF
    A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction
    • 315
    Genetic programming needs better benchmarks
    • 200
    • PDF
    Scikit-learn: Machine Learning in Python
    • 29,733
    • PDF
    OpenML: networked science in machine learning
    • 524
    • PDF