In this seminar, students will learn how to: (a) critically read and interpret scientific papers drawn from literature on responsible data management and responsible data science, (b) give a scientific presentation that is technically precise, concentrated on the relevant topics, and also enjoyable; (c) write a scientific analysis of an existing paper based on various sources, such as contemporary computer science journals and conference proceedings. In addition, students will learn about state-of-the art and current research topics in responsible data science and data management.
Note that this seminar is limited to 8 participants. Checkout the module description and time table on MOSES for more details on the logistics.
Preliminary paper list for the seminar:
- Pradhan et al.: Interpretable Data-Based Explanations for Fairness Debugging, SIGMOD'22
- Shankar et al.: Automatic and Precise Data Validation for Machine Learning, CIKM'23
- Shahbazi et al.: Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching, VLDB'24
- Li et al.: Query Refinement for Diversity Constraint Satisfaction, VLDB'23
- Grafberger et al.: Data Distribution Debugging in Machine Learning Pipelines, VLDBJ'22
- Sambasivan et al.: “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI, CHI'21
- Schelter et al.: HedgeCut - Maintaining Randomised Trees for Low-Latency Machine Unlearning, SIGMOD'22
- Northcutt et al.: Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks, NeurIPS'21
- Schelter et al.: Automating Large-Scale Data Quality Verification, VLDB'18
- Erfanian et al.: Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities, VLDB'24
- Birhane et al.: Multimodal datasets: misogyny, pornography, and malignant stereotypes, arxiv
- Lee et al.: Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health, EMNLP'21
- Stoyanovich et al.: Nutritional labels for data and models, IEEE Data Engineering Bulletin
- Ding et al.: Retiring Adult: New Datasets for Fair Machine Learning, NeurIPS'21
This course is run by the DEEM Lab at BIFOLD/TU Berlin, checkout our website at https://deem.berlin for details on our research.
- Trainer/in: Pierre Sicco Lubitzsch
- Trainer/in: Olga Ovcharenko
- Trainer/in: Sebastian Schelter