• Corpus ID: 239886003

Towards better data discovery and collection with flow-based programming

@inproceedings{Paleyes2021TowardsBD,
  title={Towards better data discovery and collection with flow-based programming},
  author={Andrei Paleyes and Christian Cabrera and Neil D. Lawrence},
  year={2021}
}
Despite huge successes reported by the field of machine learning, such as voice assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for data-oriented activities. This paper explores the potential of flow-based programming (FBP) for simplifying data discovery and collection in software systems. We compare FBP with the currently prevalent service… 

Figures and Tables from this paper

An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context

This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications, and empirically evaluates FBP in the context of ML deployment on four applications that represent typical data science projects, revealing that FBP is a suitable paradigm for data collection and data science tasks.

Challenges in Deploying Machine Learning: A Survey of Case Studies

By mapping found challenges to the steps of the machine learning deployment workflow, it is shown that practitioners face issues at each stage of the deployment process.

References

SHOWING 1-10 OF 47 REFERENCES

Data Readiness Levels

The use of data readiness levels is proposed: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.

Data Lifecycle Challenges in Production Machine Learning

Challenges in data understanding, data validation and cleaning, and data preparation are explored - how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them are explored.

Flow-based programming

I believe that this concept of multiple asynchronous processes communicating via streams of data can revolutionize the authors' industry, and I am hoping that in the next few years some of this potential will be realized.

Challenges in Deploying Machine Learning: A Survey of Case Studies

By mapping found challenges to the steps of the machine learning deployment workflow, it is shown that practitioners face issues at each stage of the deployment process.

Data Engineering for Data Analytics: A Classification of the Issues, and Case Studies

This paper provides a description and classification of such tasks into high-levels groups, namely data organization, data quality and feature engineering, and makes available four datasets and example analyses that exhibit a wide variety of these problems.

Data-Driven Workflows for Microservices: Genericity in Jolie

This paper extends Jolie to support the possibility of expressing choices at the level of data types, a feature well represented in standards for Web Services, e.g., WSDL, and enables Jolie processes to act on data generically (without knowing which type it has in the choice).

Applying Flow-based Programming Methodology to Data-driven Applications Development for Smart Environments

Preliminary results show that the Flow-based Programming approach leads to a clear transformation of the design architecture into the software implementation, speeds up the development process, and increases code reuse and maintainability.

Google Cloud Dataflow

A few times in this book, we have introduced chapters by reflecting on how much technology has changed over the past few years and how that has shaped our understanding of concepts like security,

Control Flow Versus Data Flow in Distributed Systems Integration: Revival of Flow-Based Programming for the Industrial Internet of Things

Coupling in distributed systems integration is discussed and the history of business process modeling with respect to data and control flow is reflected and some results for flow-based programming are presented in the Industrial DevOps project Titan, where it is suggested that flow- based programming for the Industrial Internet of Things is employed.

Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

An approach inspired by elements of the flow-based programming paradigm is implemented as an extension of the Luigi system which is named SciLuigi, and the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.