Corpus ID: 10659969

Scikit-learn: Machine Learning in Python

  title={Scikit-learn: Machine Learning in Python},
  author={Fabian Pedregosa and Ga{\"e}l Varoquaux and Alexandre Gramfort and Vincent Michel and Bertrand Thirion and Olivier Grisel and Mathieu Blondel and Gilles Louppe and Peter Prettenhofer and Ron Weiss and Ron J. Weiss and J. Vanderplas and Alexandre Passos and David Cournapeau and Matthieu Brucher and Matthieu Perrot and Edouard Duchesnay},
  journal={J. Mach. Learn. Res.},
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings… Expand
Scikit-learn: Machine Learning Without Learning the Machinery
A quick introduction to scikit-learn as well as to machine-learning basics are given. Expand
mlpy: Machine Learning Python
mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods forExpand
API design for machine learning software: experiences from the scikit-learn project
The simple and elegant interface shared by all learning and processing units in the Scikit-learn library is described and its advantages in terms of composition and reusability are discussed. Expand
TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning
TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to simplify the process of creating, configuring,Expand
Scikit-Learn: Machine Learning in the Python ecosystem
All objects in scikit-learn share a uniform and limited API consisting of three complementary interfaces : an estimator interface for building and fitting models ; a predictor interface for makingExpand
Pymc-learn: Practical Probabilistic Machine Learning in Python
Pymc-learn is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning that uses a general-purpose high-level language that mimics $\textit{scikit-learn}$. Expand
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and patternExpand
metric-learn: Metric Learning Algorithms in Python
Metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. Expand
dislib: Large Scale High Performance Machine Learning in Python
This paper presents and evaluates dislib, a distributed machine learning library on top of PyCOMPSs programming model that addresses the issues of other existing libraries and shows that dislib can be up to 9 times faster, and can process data sets up to 16 times larger than other popular distributed machineLearning libraries, such as MLlib. Expand
MLlib: Machine Learning in Apache Spark
MLlib is presented, Spark's open-source distributed machine learning library that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Expand


The SHOGUN Machine Learning Toolbox
A machine learning toolbox designed for unified large-scale learning for a broad range of feature types and learning settings, which offers a considerable number of machine learning models such as support vector machines, hidden Markov models, multiple kernel learning, linear discriminant analysis, and more. Expand
LIBLINEAR: A Library for Large Linear Classification
LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and libraryExpand
Modular Toolkit for Data Processing (MDP): A Python Data Processing Framework
The modular toolkit for Data Processing is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures. Expand
LIBSVM: A library for support vector machines
Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail. Expand
Guest Editor's Introduction: Python: Batteries Included
  • P. Dubois
  • Computer Science
  • Computing in Science & Engineering
  • 2007
The Python motto "batteries included" is meant to convey the idea that Python comes with everything you need. Expand
Result Analysis of the NIPS 2003 Feature Selection Challenge
The NIPS 2003 workshops included a feature selection competition organized by the authors, which took place over a period of 13 weeks and attracted 78 research groups and used a variety of methods for feature selection. Expand
PyMVPA: a Python Toolbox for Multivariate Pattern Analysis of fMRI Data
A Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets, which makes use of Python’s ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. Expand
The NumPy Array: A Structure for Efficient Numerical Computation
This effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts. Expand
Regularization Paths for Generalized Linear Models via Coordinate Descent.
In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features. Expand
A supervised clustering approach for fMRI-based inference of brain states
A method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging to predict the subject's behavior during a scanning session yields higher prediction accuracy than standard voxel-based approaches and infers an explicit weighting of the regions involved in the regression or classification task. Expand