Scikit-learn: Machine Learning in Python

  title={Scikit-learn: Machine Learning in Python},
  author={Fabian Pedregosa and Ga{\"e}l Varoquaux and Alexandre Gramfort and Vincent Michel and Bertrand Thirion and Olivier Grisel and Mathieu Blondel and Gilles Louppe and Peter Prettenhofer and Ron Weiss and Ron J. Weiss and J. Vanderplas and Alexandre Passos and David Cournapeau and Matthieu Brucher and Matthieu Perrot and E. Duchesnay},
  journal={J. Mach. Learn. Res.},
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings… 
Scikit-learn: Machine Learning Without Learning the Machinery
A quick introduction to scikit-learn as well as to machine-learning basics are given.
mlpy: Machine Learning Python
mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for
API design for machine learning software: experiences from the scikit-learn project
The simple and elegant interface shared by all learning and processing units in the Scikit-learn library is described and its advantages in terms of composition and reusability are discussed.
TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning
TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to simplify the process of creating, configuring,
Scikit-Learn: Machine Learning in the Python ecosystem
All objects in scikit-learn share a uniform and limited API consisting of three complementary interfaces : an estimator interface for building and fitting models ; a predictor interface for making
Pymc-learn: Practical Probabilistic Machine Learning in Python
Pymc-learn is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning that uses a general-purpose high-level language that mimics $\textit{scikit-learn}$.
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern
metric-learn: Metric Learning Algorithms in Python
Metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators.
dislib: Large Scale High Performance Machine Learning in Python
This paper presents and evaluates dislib, a distributed machine learning library on top of PyCOMPSs programming model that addresses the issues of other existing libraries and shows that dislib can be up to 9 times faster, and can process data sets up to 16 times larger than other popular distributed machineLearning libraries, such as MLlib.
DEBoost: A Python Library for Weighted Distance Ensembling in Machine Learning
In this paper, we introduce deboost, a Python library devoted to weighted distance ensembling of predictions for regression and classification tasks. Its backbone resides on the scikit-learn library


The SHOGUN Machine Learning Toolbox
A machine learning toolbox designed for unified large-scale learning for a broad range of feature types and learning settings, which offers a considerable number of machine learning models such as support vector machines, hidden Markov models, multiple kernel learning, linear discriminant analysis, and more.
LIBLINEAR: A Library for Large Linear Classification
LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library
Modular Toolkit for Data Processing (MDP): A Python Data Processing Framework
The modular toolkit for Data Processing is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures.
LIBSVM: A library for support vector machines
Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Guest Editor's Introduction: Python: Batteries Included
  • P. Dubois
  • Computer Science
    Computing in Science & Engineering
  • 2007
The Python motto "batteries included" is meant to convey the idea that Python comes with everything you need.
Result Analysis of the NIPS 2003 Feature Selection Challenge
The NIPS 2003 workshops included a feature selection competition organized by the authors, which took place over a period of 13 weeks and attracted 78 research groups and used a variety of methods for feature selection.
PyMVPA: a Python Toolbox for Multivariate Pattern Analysis of fMRI Data
A Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets, which makes use of Python’s ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages.
The NumPy Array: A Structure for Efficient Numerical Computation
This effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.
Regularization Paths for Generalized Linear Models via Coordinate Descent.
In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.