Mining Complex Dynamic Data
In recent years, many applications require mining from richer data types than conventional data(base) records: the analysis of social networks requires the combination of activity recordings with content (e.g. resource descriptions and user records); recommendation engines require considering user ratings, customer transactions, item descriptions and user profiles; medical applications require the combination of different kinds of recordings on patients, including historical data on ailments and medication. At the same time, the mining tasks become more elaborate: the data are multi-faceted and adhere to many, orthogonal or overlapping concepts; the data accumulate or form streams; they are dynamic and call for adaptation of the mining models. In this tutorial, we discuss mining on complex data, putting the emphasis on learning and adaptation over streaming, dynamic data.
We consider three categories of complex data: data that adhere to multiple overlapping labels, high-dimensional data that contain interesting subspaces, and data that span across multiple tables. For each category, we first provide a comprehensive overview of static mining methods, and then focus on methods and example applications for dynamic data. For multi-label stream data, we focus on the example application of document (news) categorization; the core methods are stream classification with decision trees, prediction and ranking. For high-dimensional stream data, we focus on the example application of bioinformatics and network intrusion; the core methods are stream subspace clustering and outlier detection. For multi-relational stream data, we consider two example applications: analysis of dynamic social networks, and analysis of evolving customer data; the core methods are tensor-based clustering, and multi-relational clustering and classification.
The target groups are: postgraduate students with solid background in data mining; research scholars who work on conventional stream mining and are confronted with applications on complex data; practitioners that own applications on complex and dynamic data.
Hans-Peter Kriegel is a full professor for database systems and data mining in the Institute for Informatics at the Ludwig Maximilians University Munich, Germany and has served as the department chair or vice chair over the last years. His research interests are in spatial and multimedia database systems, particularly in query processing, performance issues, similarity search, high-dimensional indexing as well as in knowledge discovery and data mining.
Irene Ntoutsi is a postdoc researcher in the Database Systems and Data Mining group of Prof. Hans-Peter Kriegel at the Ludwig Maximilians University Munich, Germany and scholar of the Alexander vor Humboldt Foundation. Her main research interests include data mining over dynamic data with a current focus on high dimensional data.
Myra Spiliopoulou is Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research interest is knowledge discovery and adaptation.
Grigorios Tsoumakas is a Lecturer at the Department of Informatics of the Aristotle University of Thessaloniki (AUTH). His research interests include machine learning, knowledge discovery and data mining.
Arthur Zimek is a scientific assistant in the database systems and data mining group of Hans-Peter Kriegel at the Ludwig Maximilians University Munich, Germany. His research interests include subspace clustering and correlation clustering.