Kurser
“Data integration is the 800-pound gorilla in the corner, and everyone’s got it in spades,” according to Mike Stonebraker, MIT professor and Turing Award Laureate. The most challenging and time-consuming task of data scientists in the era of Big Data is to consolidate data from different sources, overcoming dirty data, heterogeneity in data representations, and incompleteness of data. In this course, we will surface the entire pipeline of an information integration workflow, by learning about existing integration architectures, algorithms in data cleansing, schema matching, and data fusion. Furthermore, we will discuss state-of-the-art systems and prominent use cases of information integration techniques.
- Trainer/in: Ziawasch Abedjan
- Trainer/in: Fatemeh Ahmadi
- Trainer/in: Mohamed Ahmed Abdelmaksoud Mohamed
In this course, the students will develop solutions for large scale data integration. Working in groups of up to 4 students, the goal is to reproduce an existing research prototype starting from the related paper and enhance it with their own ideas. All groups are accompanied by a mentor from the D2IP group to report and capture progress. The students will learn to implement scalable algorithms, evaluate them systematically, read and interpret technical papers, and critically judge experimental results. At the same time, students will learn to deal with data heterogeneity problems at scale.
- Trainer/in: Ziawasch Abedjan
- Trainer/in: Luca Zecchini