Green Data Science - Using Big Data in an "Environmentally Friendly" Manner

  title={Green Data Science - Using Big Data in an "Environmentally Friendly" Manner},
  author={Wil M.P. van der Aalst},
The widespread use of "Big Data" is heavily impacting organizations and individuals for which these data are collected. Sophisticated data science techniques aim to extract as much value from data as possible. Powerful mixtures of Big Data and analytics are rapidly changing the way we do business, socialize, conduct research, and govern society. Big Data is considered as the "new oil" and data science aims to transform this into new forms of "energy": insights, diagnostics, predictions, and… 

Figures and Tables from this paper

Responsible Data Science: Using Event Data in a "People Friendly" Manner

This paper discusses Responsible Process Mining (RPM) as a new challenge in the broader field of Responsible Data Science (RDS), and strongly believes that techniques, infrastructures and approaches can be made responsible by design.

Principles of Green Data Mining

It is described how data scientists can contribute to designing environmentally friendly data mining processes, for instance, by using green energy, choosing between make-or-buy, exploiting approaches to data reduction based on business understanding or pure statistics, or choosing energy friendly models.

Green Data Mining using Approximate Computing: An experimental analysis with Rule Mining

This work has applied some approximation techniques which can be used to achieve energy efficient data mining or Green Data mining with results as best as possible for a given allowable deviation.

Configuration of Data Monetization: A Review of Literature with Thematic Analysis

The present study is aimed to clarify the configuration of data monetization by conducting a systematic review of the thematic analysis based on inductive approach and the proposed configuration is validated by a real application, i.e., Cardlytics.

Relational Data Mining in the Era of Big Data

A brief review of the literature on Relational Data Mining in the fields of Spatial Data Mining, Process Mining, Network Data Analysis and Stream Data Mining is reported, with an emphasis on the Italian research.

A Systematic Review of Recommendations of Long-Term Strategies for Researchers Using Data Science Techniques

The results show the need for studies that generate more specific recommendations based on data mining, and leave open research opportunities from two particular perspectives—applying methodologies involving process mining for the context of research analytics and the feasibility study on long-term strategies using data science techniques.

Tweeting about Sustainability: Can Emotional Nowcasting Discourage Greenwashing?

Less than 100 firms worldwide are recognised by Bloomberg to report accurate greenhouse gas emissions. Yet, tens of thousands of people are talking and tweeting about climate change every day. How

The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review

A systematic scoping review was conducted to identify the ethical issues of AI application in healthcare, to highlight gaps, and to propose steps to move towards an evidence-informed approach for addressing them.

Ciência responsável dos dados: imparcialidade, precisão, confidencialidade, e transparência dos dados

Introducao : no contexto Big Data, surge, como necessidade urgente, a aplicacao de direitos individuais e empresariais e de normas regulatorias que resguardem a privacidade, a imparcialidade, a

The Visual Side of the Data

This chapter will review the main approaches to visual queries and provide an historical overview of information visualization and how these functionalities should be adapted to big data, including streaming ones.



Process Mining: Data science in Action

Process mining bridges the gap between traditional model-based process analysis and datacentric analysis techniques such as machine learning and data mining, and can be applied to any type of operational processes.

Privacy-by-design in big data analytics and social mining

The privacy-by-design paradigm is proposed to develop technological frameworks for countering the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of social mining and big data analytical technologies.

Mining of Massive Datasets

Determining relevant data is key to delivering value from massive amounts of data and big data is defined less by volume which is a constantly moving target than by its ever-increasing variety, velocity, variability and complexity.

Data Scientist: The Engineer of the Future

The data science discipline and motivates its importance are discussed; it is believed that the data scientist will be the engineer of the future and also scientific research is becoming more data-driven.

Practical Implications of Sharing Data: A Primer on Data Privacy, Anonymization, and De-Identification

The fundamental differences between encrypted data, "de-identified", "anonymous", and "coded" data, and the methods to implement each are covered, as well as the landscape of maturity models that can be used to benchmark your organization’s data privacy and protection of sensitive data.

Know What You Stream: Generating Event Streams from CPN Models in ProM 6

New developments that build on top of previous work related to the integration of data streams within the process mining framework ProM are presented, including means to use Coloured Petri Net models as a basis for eventstream generation.

Business Process Management: A Comprehensive Survey

The practical relevance of BPM and rapid developments over the last decade justify a comprehensive survey and an overview of the state-of-the-art in BPM.

Decomposing Petri nets for process mining: A generic approach

The decomposition approach is generic and can be combined with different existing process discovery and conformance checking techniques to split computationally challenging process mining problems into many smaller problems that can be analyzed easily and whose results can be Combined into solutions for the original problems.

Discrimination-aware data mining

This approach leads to a precise formulation of the redlining problem along with a formal result relating discriminatory rules with apparently safe ones by means of background knowledge, and an empirical assessment of the results on the German credit dataset.

Replaying history on process models for conformance checking and performance analysis

The importance of maintaining a proper alignment between event log and process model is elaborated on and their application to conformance checking and performance analysis is elaborated.