Specifying users' interests with a formal query language is a typically challenging task, which becomes even harder in the context of multi-model data management because we have to deal with data variety. It usually lacks a unified schema to help the users issuing their queries, or has an incomplete schema as data come from disparate sources. Multi-Model DataBases (MMDBs) have emerged as a promising approach for dealing with this task as they are capable of accommodating and querying the multi-model data in a single system. This tutorial aims to offer a comprehensive presentation of a wide range of query languages for MMDBs and to make comparisons of their properties from multiple perspectives. We will discuss the essence of cross-model query processing and provide insights on the research challenges and directions for future work. The tutorial will also offer the participants hands-on experience in applying MMDBs to issue multi-model data queries.
Webpage for the hands-on instructions:
The tutorial is planned for 3 hours and is divided into 6 parts as follows:
We start the tutorial by introducing data variety and motivating the need for multi-model data management.
1.1 Basics on data variety
1.2 The need and essence for multi-model data management
Related References:
We will briefly discuss the major data models adopted by database systems.
2.1 The relational model
2.2 Extensions of the relational model
2.3 The semi-structured data models such as XML and JSON
2.4 The graph data models
Related References:
We will discuss several well-known multi-model data query languages, which fall into three categories. We will also provide an E-commence dataset (e.g., Unibench) and a detailed instruction for the participates to write and run some multi-model queries by using ArangoDB databases to provide them hands-on experience.
3.1 The SQL-extensions
3.2 The XML/JSON-extensions
3.3 The graph-extensions
Related References:
We will make a comparative study of the query languages from 4 perspectives such as semantic difference, expressibility, the internal representation, and the manner of query evaluation.
4.1 The semantic difference
4.2 The expressive power
4.3 The internal representation
4.4 The manners of query evaluation
Related References:
We then conclude with a discussion of open problems and challenges in designing multi-model data query languages.
5.1 Design an algebra for a multi-model query language.
5.2 General approaches for cross-model query processing.
We will invite the participants to write and run some multi-model queries by using ArangoDB.
6.1 Generate an E-commence dataset with Unibench
6.2 Hands-on experience for multi-model queries with ArangoDB.
Detailed instructions:
Related References:
Qingsong Guo is a postdoctoral researcher at the University of Helsinki, Finland. He received Ph.D. degree at the University of Southern Denmark in 2016. His current research interests include multi-model data management and learning to manage big data with deep learning algorithms.
Jiaheng Lu is a professor at the University of Helsinki, Finland. His main research interests lie in the Big Data management and database systems. He has published more than one hundred journal and conference papers. He has published several books on XML, Hadoop, and NoSQL databases. He has given several tutorials on multi-model data management and autonomous databases in VLDB, CIKM, and EDBT conferences. He frequently serves as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT, CIKM, etc.
Chao Zhang is a Ph.D. candidate at the Department of Computer Science, University of Helsinki (UH). His research topic is multi-model database benchmarking and query optimization. Prior to joining UH, Chao spent one year at Renmin University of China (RUC) for Ph.D. studies.
Calvin Sun is Chief Database Architect at Huawei Cloud. He has 20+ years of working experience in the development of several database systems, ranging from the embedded database, large-scale distributed database, to cloud-native database. Calvin joined Huawei Toronto Distributed Scheduling and Data Engine Lab in October 2017. Prior to joining Huawei, he was a consulting member of technical staff at Oracle Cloud. He also served as manager of the storage engines team at MySQL Inc., manager of the InnoDB team at Oracle, and manager of MySQL development at Twitter.
Steven Yuan is the director of Huawei Toronto Distributed Scheduling and Data Engine Lab. He leads an over 30 people research team in big data and cloud domain. More specifically, his lab research focuses on distributed scheduling, from IaaS to PaaS, and Distributed Database as a service. Before joined Huawei in Aug 2014, Steven was a senior manager and had 18 years of working experience in IBM HPC product LSF and Symphony. Steven is an expert in distributed resource management and scheduling was an inventor of 4 U.S. patents in the SLA scheduling and job placement. Steven got his Ph.D from Peking University in 1995. In the following year, he did his post-doc research in the large scale heterogeneous distributed computing field following Prof. Songnian Zhou at the University of Toronto.