Data Management is one of the central challenges in developing modern software systems. The need for more sophisticated Data Management is even more emphasized in the current times of Artificial Intelligence and Big Data-based systems which have even more demanding data requirements than traditional Data Management had to consider.
In Data Engineering, we focus on preparing data for its deployment or usage in a complex AI/data-driven system. This covers for example discovering data, cleaning data, transforming data, or integrating data from heterogenous sources. Also, there is a focus on (domain-specific) meta-data creation and management. Furthermore, aspects of data biases and potentially arising societal issues like misrepresentation and unfairness become focus area. Data Engineering topics are often seen in the context of their application domains, like Digital Humanities, medicine, but also business application like banking.
In Scalable Data Management, the focus is on how to cope with the ever-increasing demand for storage and processing power by scaling data operations. This covers for example methods for stream-processing but also flexible distribution schemes or the deployment of scalable AI-models.
With Amalur project we believe that this is the right moment to revisit all the components of classic data integration (DI) systems, and to see how these fit into modern data lakes that are meant to support linear algebra as a first-class citizen.
Valentine is an extensible open-source project to execute and organize large-scale automated matching processes on tabular data either for experimentation or deployment in real world data. Valentine was published in ICDE 2021 and demoed in VLDB 2021.
Clonos is a fault tolerance approach that achieves fast operator recovery with exactly-once guarantees and high availability by instantly switching to passive standby operators. Clonos enforces causally consistent recovery, including output deduplication, by tracking nondeterminism within the system through causal logging. Clonos was presented in a SIGMOD 2021 paper.
This project deals with executing transactions (two-phase commit and SAGAs) on Stateful Functions-as-a-Service systems such as Apache Flink's Statefun. This work has been awarded the best paper award in ACM DEBS 2021.