Chao Zhang defends his PhD thesis on Performance Benchmarking and Query Optimization for Multi-Model Databases

On Monday the 19th of April 2021, M.Sc. Chao Zhang will defend his doctoral thesis on Performance Benchmarking and Query Optimization for Multi-Model Databases. The thesis is related to research done in the Department of Computer Science and in the Unified Database Management Systems group at the University of Helsinki.

M.Sc. Chao Zhang defends his doctoral thesis on Performance Benchmarking and Query Optimization for Multi-Model Databases on Monday the 19th of April 2021 at 13 o'clock in the University of Helsinki Exactum building, Auditorium B123 (Pietari Kalmin katu 5, 1st floor). His opponent is Associate Professor Konstantinos Stefanidis (Tampere University, Finland) and custos Professor Jiaheng Lu (University of Helsinki). The defence will be held in English. It is possible to follow the defence as a live stream at https://helsinki.zoom.us/j/66764680740.

The thesis of Chao Zhang is a part of research done in the Department of Computer Science and in the Unified Database Management Systems group at the University of Helsinki. His supervisor has been Professor Jiaheng Lu (University of Helsinki).

Performance Benchmarking and Query Optimization for Multi-Model Databases

Multi-Model DataBases (MMDBs) are database management systems that utilize a single platform to store, manipulate, and query data in various data models, e.g., relational, tree, and graph models. Numerous MMDBs have been developed to facilitate multi-model data management, but they differ fundamentally in data storage, query language, and query processing. Existing tools are not suitable for benchmarking MMDBs because they do not consider the multi-model data and workloads. Therefore, it is of paramount importance to provide a new benchmark to evaluate the performance of MMDBs. Furthermore, MMDBs bring new issues for query optimization as traditional techniques cannot properly optimize the queries due to the lack of consideration of multi-model operators and storage. Thus, it calls for new approaches to optimize multi-model queries.

This thesis is divided into two parts to accomplish the above two goals, respectively. In the first part, we developed a new benchmark system for MMDBs in a scenario of social commerce with five data models, i.e., relational, JSON, XML, graph, and key-value. This benchmark system is an end-to-end tool that consists of various components, including synthetic data generation, workload specification, parameter curation, and execution client. Moreover, we leveraged the developed system to conduct a holistic experimental evaluation of the state-of-the-art MMDBs to compare their performance and identify their performance bottlenecks in handling multi-model workloads. In the second part, we proposed two query optimization techniques for MMDBs. Firstly, we proposed a kernel density estimation (KDE)-based model to estimate the selectivity of multi-model joins that involve predicates for relational and tree data models. The estimation method can serve as a building block for selecting an optimal joining order in a cross-model query execution plan. We also proposed two approaches, the max-min approximation (MMA) and grid-based approximation (GBA) models, to approximate the KDE contribution while improving the estimation efficiency for large data samples. Secondly, we studied the problem of view selection in the relational-based graph databases to avoid the costly joins in the relational engine. Particularly, we proposed an end-to-end system that can automatically create, evaluate, and select views for accelerating the query processing. We proposed an extended graph view that can answer the subgraph and supergraph queries simultaneously. We devised a filtering-and-verification framework that enables the verification of the query containment by graph views. We formalized the view selection problem as a 0-1 Knapsack problem, then we developed a view selection algorithm, named graph gene algorithm (GGA), which explores the graph view transformations to reduce the view space and optimize the view benefit. Overall, the present thesis advances MMDBs in three aspects, i.e., performance benchmarking, join selectivity estimation, and automatic view selection.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-7198-6.

Printed copies will be available on request from Chao Zhang: chao.z.zhang@helsinki.fi.