Data Mining and Data-driven Modeling Approaches to Support Wastewater Treatment Plant Operation


In wastewater treatment plants (WWTPs), much effort and money is invested in operating and maintaining dense plant-wide measuring networks. The network primarily serves as input for the advanced control scenarios that are implemented in the supervisory control and data acquisition (SCADA) system to satisfy the stringent effluent quality constraints. Due to new developments in information technology, long-term archiving has become practicable, and specialized process information systems are now available. The steadily growing amount of plant data available, however, is not systematically exploited for plant optimization because of the lack of specialized tools that allow operators and engineers alike to extract meaningful and valuable information efficiently from the massive amount of high-dimensional data. As a result, most information contained in the data is eventually lost. In the past few years, many data mining techniques have emerged that are capable of analyzing massive amounts of data. Available processing power allowed the development of efficient data-driven modeling techniques especially suited to situations in which the speed of data acquisition surpasses the time available for data analysis. However, although these methods are promising ways to provide valuable information to the operator and engineer, there is currently no fully developed interest in the application of these techniques to support WWTP operation. In this thesis, the applicability of data mining and data-driven modeling techniques in the context of WWTP operation is investigated. This context, however, implies specific characteristics that the adapted and developed techniques must satisfy to be practicable: On the one hand, the deployment of a given technique on a plant must be fast, simple and costeffective. As a consequence, it must consider data that are already available or that can be gathered easily. On the other hand, the application must be safe, i.e., the extracted information must be reliable and communicated clearly. This thesis presents the results of four knowledge discovery projects that adapted data mining and data-driven modeling techniques to tackle problems relevant to either the operator or the process engineer. First, the extent to which data-driven modeling techniques are suitable for the automatic generation of software sensors exclusively based on measured data available in the SCADA system of the plant is investigated. These software sensors are meant to be substitutes for failure-prone and maintenance-intensive sensors and to diagnose hardware sensors. In two full-scale experiments, four modeling techniques for software-sensor development are compared and the role of expert knowledge is investigated. The investigations show that the non-linear modeling techniques outperform the linear technique and that a higher degree of expert knowledge is beneficial for long term accuracy, but can lead to reduced performance in the short term. Consequently, if frequent model recalibration is possible, as is the case for sensor diagnosis applications, automatic development given limited expert knowledge is feasible. In contrast, optimum use of expert knowledge requires model transparency, which is only given for two of the investigated techniques: generalized least squares regression and self-organizing maps.

38 Figures and Tables

Cite this paper

@inproceedings{Erome2011DataMA, title={Data Mining and Data-driven Modeling Approaches to Support Wastewater Treatment Plant Operation}, author={David J Erˆome and D Urrenmatt and Dipl and Umwelt.-Ing Eth and Willi Gujer and Assoc Eberhard Morgenroth and G{\"{u}rkan and G. E. P. Box and In Kl{\"a}ranlagen and Sarina Jenni and Bettina Sterkele and Christoph Egger and Markus Gresch and Marc B. Neumann and Pascal Wunderlin}, year={2011} }