• Corpus ID: 174803510

Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter

  title={Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter},
  author={Frances Ann Hubis and Wentao Wu and Ce Zhang},
Simplifying machine learning (ML) application development, including distributed computation, programming interface, resource management, model selection, etc, has attracted intensive interests recently. These research efforts have significantly improved the efficiency and the degree of automation of developing ML models. In this paper, we take a first step in an orthogonal direction towards automated quality management for human-in-the-loop ML application development. We build ease. ml/meter… 
1 Citations

Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise

This paper designs a practical Bayes error estimator that is compared against baseline feasibility study candidates on 6 datasets (with additional real and synthetic noise of different levels) in computer vision and natural language processing and demonstrates in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.



Preserving Statistical Validity in Adaptive Data Analysis

It is shown that, surprisingly, there is a way to estimate an exponential in n number of expectations accurately even if the functions are chosen adaptively, and this gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates.

Northstar: An Interactive Data Science System

This paper presents Northstar, the Interactive Data Science System, which has been developed over the last 4 years to explore designs that make advanced analytics and model building more accessible.

The Ladder: A Reliable Leaderboard for Machine Learning Competitions

This work introduces a notion of leaderboard accuracy tailored to the format of a competition called the Ladder and demonstrates that it simultaneously supports strong theoretical guarantees in a fully adaptive model of estimation, withstands practical adversarial attacks, and achieves high utility on real submission files from an actual competition hosted by Kaggle.

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

This paper presents this http URL, to the best knowledge, the first continuous integration system for machine learning, and designs a domain specific language that allows users to specify integration conditions with reliability constraints, and develops simple novel optimizations that can lower the number of labels required for test conditions popularly used in real production systems.

Microsoft Azure Machine Learning

Learning under Concept Drift: an Overview

This report is intended to give a bird's view of concept drift research field, provide a context of the research and position it within broad spectrum of research fields and applications.

Exploring galaxy evolution with generative models

A neural network is used to show how to independently manipulate physical attributes by encoding objects in latent space by learning a latent space representation of the data, which can be used to forward model and explore hypotheses in a data-driven way.

Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks

An attention-based convolutional neural network architecture trained on 600,000 doctor notes in German is used, based on the learning of attention scores and a method of automatic validation using the same data to render the classification task transparent from a medical perspective.

Ease.ml in Action: Towards Multi-tenant Declarative Learning Services

In this demonstration, the design principles of ease.ml are presented, the implementation of its key components are highlighted, and it is shown how ease.ML can help ease machine learning tasks that often perplex even experienced users.

Using transfer learning to detect galaxy mergers

This work investigates the use of deep convolutional neural networks (deep CNNs) for automatic visual detection of galaxy mergers, and finds that transfer learning can act as a regulariser in some cases, leading to better overall classification accuracy.