pyWATTS: Python Workflow Automation Tool for Time Series

Contact:
Prof. Dr.-Ing. Ralf Mikut
Project Group:
Benedikt Heidrich

Marian Turowski

Oliver Neumann

Kaleb Phipps

Stefan Meisenbacher

Prof. Dr.-Ing. Ralf Mikut

Prof. Dr. Veit Hagenmeyer
Funding: HGF
Startdate: 2019
Enddate: 2023

Links

https://github.com/KIT-IAI/pyWATTS

pyWATTS is an open-source Python-based workflow automation tool for time series analysis to simplify the evaluation process and the design of repetitive experiments. Moreover, it allows the integration of existing models due to standardized API.

Please note that pyWATTS is no longer actively maintained. However, the core functionality has been integrated into the open-source Python package sktime.

pyWATTS: Python Workflow Automation Tool for Time Series

Time series data are fundamental for a variety of applications, ranging from financial markets to energy systems. Due to their importance, the number and complexity of tools and methods used for time series analysis is constantly increasing. However, due to unclear APIs and a lack of documentation, researchers struggle to integrate them into their research projects and replicate results. Additionally, in time series analysis there exist many repetitive tasks, which are often re-implemented for each project, unnecessarily costing time. pyWATTS, an open-source Python-based package, is a non-sequential workflow automation tool for the analysis of time series data that aims to solve the above problems. pyWATTS includes modules with clearly defined interfaces to enable seamless integration of new or existing methods, subpipelining to easily reproduce repetitive tasks, load and save functionality to simply replicate results, and native support for key Python machine learning libraries such as scikit-learn, PyTorch, and Keras.

Some key features of pyWATTS are:

A platform-independent solution to implement workflows from start to finish using pipelines. Thereby, time series experiments can be performed in an organized manner and in any environment that supports pyWATTS.
Enables re-usability through subpipelining. Any useful part of a time series experiment, e.g. preprocessing, can be defined as a subpipeline and integrated into other pipelines without further adaption and independently of the original experiment.
Saving and loading of any given pipeline configuration to reproduce results at a later date.
Simple integration of new research approaches through a plug-and-play style environment where modules implemented in pyWATTS can be exchanged seamlessly between pipelines through a modular architecture with data handling through xarray.
A clear API of the modules, i.e. transform and fit methods, ensuring that pipelines within pyWATTS are adaptable and that modules can easily run on multiple data sets, at different points in a pipeline, and in various pipelines.
Integration of different modules for the same part in the pipeline such that a condition mechanism decides which module is executed depending on the characteristics of the applied data.
Uses pandas DataFrame or xarray Dataset as input, allowing users to flexibly read the data from any source (file, database, website) with their method of choice.
Callbacks, e.g. for visualizing, analyzing, and writing the intermediate results of modules.

Existing Cooperation

pyWATTS is currently under development by the Institute for Automation and Applied Informatics (IAI) at the Karlsruhe Institute for Technology. Currently Prof. Dr. Jorge Ángel González Ordiano from the Universidad Iberoamericana Ciudad de México is also involved in pyWATTS development.

Funding

pyWATTS is funded by the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI, the Helmholtz Association under the Program “Energy System Design”, the Joint Initiative “Energy System Design - A Contribution of the Research Field Energy”, the Helmholtz Metadata Collaboration, and the German Research Foundation (DFG) as part of the Research Training Group 2153 “Energy Status Data: Informatics Methods for its Collection, Analysis and Exploitation” and under Germany’s Excellence Strategy – EXC number 2064/1 – Project number 390727645.

Publications

sktime - python toolbox for time series: new features 2023 – advanced pipelines, probabilistic forecasting, parallelism support, composable classifiers and distances, reproducibility features
Kiraly, F.; Ray, A.; Heidrich, B.
2023

sktime – python toolbox for time series: pipelines and benchmarking
Kiraly, F.; Heidrich, B.
2023, September 17. PyCon CZ (2023), Prague, Czechia, September 15–17, 2023

Non-Sequential Machine Learning Pipelines with pyWATTS
Heidrich, B.; Phipps, K.; Meisenbacher, S.; Turowski, M.; Neumann, O.; Mikut, R.; Hagenmeyer, V.
2023. Zenodo. doi:10.5281/zenodo.7740850

sktime - python toolbox for time series: pipelines and transformers
Kiraly, F.; Heidrich, B.; Parker, M.; Walter, M.
2022. pyDATA Global (2022), Online, December 1–3, 2022

Automating Time Series Analysis Workflows with pyWATTS
Heidrich, B.; Phipps, K.; Neumann, O.; Meisenbacher, S.; Turowski, M.; Mikut, R.; Hagenmeyer, V.
2022, June. Helmholtz Artificial Intelligence Conference (Helmholtz AI 2022), Dresden, Germany, June 2–3, 2022

Smart Data Representations: Impact on the Accuracy of Deep Neural Networks
Neumann, O.; Turowski, M.; Ludwig, N.; Heidrich, B.; Hagenmeyer, V.; Mikut, R.
2021. Proceedings - 31. Workshop Computational Intelligence : Berlin, 25. - 26. November 2021. Hrsg.: H. Schulte; F. Hoffmann; R. Mikut, 113–130, KIT Scientific Publishing

Concepts for Automated Machine Learning in Smart Grid Applications
Meisenbacher, S.; Pinter, J.; Martin, T.; Hagenmeyer, V.; Mikut, R.
2021. Proceedings - 31. Workshop Computational Intelligence : Berlin, 25. - 26. November 2021. Hrsg.: H. Schulte; F. Hoffmann; R. Mikut, 11–35, KIT Scientific Publishing

pyWATTS: Python Workflow Automation Tool for Time Series
Heidrich, B.; Bartschat, A.; Turowski, M.; Neumann, O.; Phipps, K.; Meisenbacher, S.; Schmieder, K.; Ludwig, N.; Mikut, R.; Hagenmeyer, V.
2021. Cornell University

Completed Project