Informazioni sul corso | [SoSe 2026] Seminar on Data Engineering for AI & ML

[SoSe 2026] Seminar on Data Engineering for AI & ML

In this seminar, students will learn how to: (a) critically read and interpret scientific papers drawn from literature on responsible data management and responsible data science, (b) give a scientific presentation that is technically precise, concentrated on the relevant topics, and also enjoyable; (c) write a scientific analysis of an existing paper based on various sources, such as contemporary computer science journals and conference proceedings. In addition, students will learn about state-of-the art and current research topics in responsible data science and data management.

Note that this seminar is limited to 8 participants. Checkout the module description and time table on MOSES for more details on the logistics.

Preliminary paper list for the seminar:

Pradhan et al.: Interpretable Data-Based Explanations for Fairness Debugging, SIGMOD'22
Shankar et al.: Automatic and Precise Data Validation for Machine Learning, CIKM'23
Shahbazi et al.: Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching, VLDB'24
Li et al.: Query Refinement for Diversity Constraint Satisfaction, VLDB'23
Grafberger et al.: Data Distribution Debugging in Machine Learning Pipelines, VLDBJ'22
Sambasivan et al.: “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI, CHI'21
Schelter et al.: HedgeCut - Maintaining Randomised Trees for Low-Latency Machine Unlearning, SIGMOD'22
Northcutt et al.: Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks, NeurIPS'21
Schelter et al.: Automating Large-Scale Data Quality Verification, VLDB'18
Erfanian et al.: Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities, VLDB'24
Birhane et al.: Multimodal datasets: misogyny, pornography, and malignant stereotypes, arxiv
Lee et al.: Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health, EMNLP'21
Stoyanovich et al.: Nutritional labels for data and models, IEEE Data Engineering Bulletin
Ding et al.: Retiring Adult: New Datasets for Fair Machine Learning, NeurIPS'21

This course is run by the DEEM Lab at BIFOLD/TU Berlin, checkout our website at https://deem.berlin for details on our research.

Trainer/in: Leonardo Dominici
Trainer/in: Gereon Dusella
Trainer/in: Pierre Sicco Lubitzsch
Trainer/in: Olga Ovcharenko
Trainer/in: Sebastian Schelter