4th International Workshop on Databases and Machine Learning

in conjunction with ICDE 2025 | May 19th-23rd 2025

ABOUT

After the increased adoption of machine learning (ML) in various applications and disciplines, a synergy between the database (DB) systems and ML communities emerged. Steps involved in ML pipelines, such as data preparation and cleaning, feature engineering, and management of the ML lifecycle can benefit from research conducted by the data management community. For example, the management of the ML lifecycle requires mechanisms for modeling, storing, and querying ML artifacts. Moreover, in many use cases pipelines require a mixture of relational and linear algebra operators, raising the question of whether a seamless integration between the two algebras is possible.

In the opposite direction, ML techniques are explored in core components of database systems, e.g., query optimization, indexing, and monitoring. Traditionally hard problems in databases, such as cardinality estimation, or problems with high human supervision like DB administration, might benefit more from learning algorithms than from rule-based or cost-based approaches.

The workshop aims at bringing together re-searchers and practitioners in the intersection of DB and ML research, providing a forum for DB-inspired or ML-inspired approaches addressing challenges encountered in each of the two areas. In particular, we welcome new research topics combining the strengths of both fields.

Information of the previous workshops can be accessed and seen at DBML 2024, DBML 2023 and DBML 2022.

For any questions regarding the workshop please contact: dbml25@googlegroups.com

Topics of particular interest for the workshop include, but are not limited to topics along the following two categories:

  • ML for Data Management and DBMS
  • Learned data discovery, cleaning, and transformation
  • ML-enabled data exploration and discovery in data lakes
  • Learned database design, configuration, and tuning
  • ML for query optimization, indexing, partitioning
  • Natural language enablement (e.g., queries, result summarization, chatbot interfaces, etc.)
  • Pretrained models for databases and data management, e.g. (Large Language Models).
  • Representation learning for data cleaning, preprocessing, and management
  • Benchmarking ML-oriented data management (data augmentation, data cleaning, etc) or DBMSs
  • Data Management for ML
  • Data collection and preparation for ML applications
  • Data quality and provenance for ML
  • Novel data management systems for accelerating training and inference of ML models
  • Data and metadata management for the ML lifecycle
  • DB-inspired techniques for modeling, storage, and provenance of ML artifacts

IMPORTANT DATES

All deadlines are 11:59PM AoE.

Submission deadline: January 5th 2025
Author notification: March 6th 2025
Camera-ready version: March 20th 2025
Workshop day: May 19th 2025

SUBMISSION AND AUTHOR GUIDELINES

Papers should be submitted using the Conference Management Tool. Papers must be prepared in accordance with the available IEEE format. Papers must not exceed 6 pages including the references. No appendix is allowed. Only electronic submissions in PDF format will be considered. Submissions will be reviewed in a single-blind manner.

ORGANISATION

PROGRAM COMMITTEE

  • Steven Whang - KAIST
  • Daphne Miedema - University of Amsterdam
  • Sebastian Shelter - TU Berlin
  • Jan-Christoph Kalo - University of Amsterdam
  • Julien Romero - Telecom SudParis
  • Antonios Georgakopoulos - University of Amsterdam
  • Zeyu Zhang - University of Amsterdam
  • Madelon Hulsebos - CWI Amsterdam