• Corpus ID: 67855343

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

@article{Renggli2019ContinuousIO,
  title={Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment},
  author={C{\'e}dric Renggli and Bojan Karlas and Bolin Ding and Feng Liu and Kevin Schawinski and Wentao Wu and Ce Zhang},
  journal={ArXiv},
  year={2019},
  volume={abs/1903.00278}
}
Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference - it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens. In this paper, we present this http URL, to our… 
Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter
Simplifying machine learning (ML) application development, including distributed computation, programming interface, resource management, model selection, etc, has attracted intensive interests
Building Continuous Integration Services for Machine Learning
TLDR
This paper develops the first CI system for ML, to the best of the knowledge, that integrates seamlessly with existing ML development tools.
Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization
TLDR
This paper views the management of ML development lifecycles from a data management perspective and demonstrates two closely related systems, ease.ml/ci and ease.ML/meter, that provide some “principled guidelines” for ML application development.
Ease . ml / meter : Quantitative Overfitting Management for Human-inthe-loop ML Application Development Towards Data Management for Statistical Generalization
Simplifying machine learning (ML) application development, including distributed computation, programming interface, resource management, model selection, etc, has attracted intensive interests
Machine Learning Application Development: Practitioners' Insights
TLDR
The reported challenges and best practices of ML application development are synthesized into 17 findings to inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.
Automated Trainability Evaluation for Smart Software Functions
TLDR
The different facets of trainability for the development of SSFs are described and the approach for automated trainability evaluation within an automotive CID framework which proposes to use automated quality gates for the continuous evaluation of machine learning models is presented.
Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions
TLDR
The development and maintenance on large-scale ML-based systems in industrial settings introduce new challenges specific for ML, and for the known challenges characteristic for these types of systems, require new methods in overcoming the challenges.
Software engineering for artificial intelligence and machine learning software: A systematic literature review
TLDR
The results show these systems are developed on a lab context or a large company and followed a research-driven development process, and the main challenges faced by professionals are in areas of testing, AI software quality, and data management.
Ease.ml/snoopy in action
We demonstrate ease.ml/snoopy, a data analytics system that performs feasibility analysis for machine learning (ML) applications before they are developed. Given a performance target of an ML
Studying Software Engineering Patterns for Designing Machine Learning Systems
TLDR
This research collects good/bad SE design patterns for ML techniques to provide developers with a comprehensive classification of such patterns, and reports the preliminary results of a systematic-literature review (SLR) of good/ bad design pattern for ML.
...
1
2
3
...

References

SHOWING 1-10 OF 16 REFERENCES
Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series)
TLDR
Through more than forty CI-related practices using application examples in different languages, readers learn that CI leads to more rapid software development, produces deployable software at every step in the development lifecycle, and reduces the time between defect introduction and detection, saving time and lowering costs.
Theory of Disagreement-Based Active Learning
TLDR
Recent advances in the understanding of the theoretical benefits of active learning are described, and implications for the design of effective active learning algorithms are described.
The Ladder: A Reliable Leaderboard for Machine Learning Competitions
TLDR
This work introduces a notion of leaderboard accuracy tailored to the format of a competition called the Ladder and demonstrates that it simultaneously supports strong theoretical guarantees in a fully adaptive model of estimation, withstands practical adversarial attacks, and achieves high utility on real submission files from an actual competition hosted by Kaggle.
Software engineering - principles and practice
  • H. Vliet
  • Engineering, Computer Science
  • 1993
TLDR
This new edition has been brought fully up to date, with complete coverage of all aspects of the software lifecycle and a strong focus on all the skills needed to carry out software projects on time and within budget.
Caffe: Convolutional Architecture for Fast Feature Embedding
TLDR
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Tutorial on Practical Prediction Theory for Classification
TLDR
This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and quantitatively useful and it is shown that train set bounds can sometimes be used to directly motivate learning algorithms.
The reusable holdout: Preserving validity in adaptive data analysis
TLDR
A new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis is demonstrated, and how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses is shown.
The Algorithmic Foundations of Differential Privacy
TLDR
The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.
Concentration Inequalities - A Nonasymptotic Theory of Independence
TLDR
Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.
A comparison of tight generalization error bounds
We investigate the empirical applicability of several bounds (a number of which are new) on the true error rate of learned classifiers which hold whenever the examples are chosen independently at
...
1
2
...