Gongsheng Yuan defends his PhD thesis on Keyword Searches and Schema Transformation for Multi-Model Databases

On Monday the 9th of May 2022, M.Sc. Gongsheng Yuan defends his doctoral thesis on Keyword Searches and Schema Transformation for Multi-Model Databases. The thesis is related to research done in the Department of Computer Science and in the Unified Database Management Systems group.

M.Sc. Gongsheng Yuan defends his doctoral thesis Keyword Searches and Schema Transformation for Multi-Model Databases on Monday the 9th of May 2022 at 12 o'clock in the University of Helsinki Physicum building, Room E204 (Gustaf Hällströmin katu 2, 2nd floor). His opponent is Associate Professor Georgios J. Fakas (Uppsala University, Sweden) and custos Professor Jiaheng Lu (University of Helsinki). The defence will be held in English. It is possible to follow the defence as a live stream at https://helsinki.zoom.us/j/61970283776

The thesis of Gongsheng Yuan is a part of research done in the Department of Computer Science and in the Unified Database Management Systems group at the University of Helsinki. His supervisor has been Professor Jiaheng Lu (University of Helsinki).

Keyword Searches and Schema Transformation for Multi-Model Databases

The "Variety" of data is promoting the evolution and development of databases. One of the influence results is the emergence of multi-model databases, whose core idea is to utilize a single and unified platform to manage well-structured data and NoSQL data. So far, the database community has proposed quite a few multi-model databases to support different data models (e.g., relational, JSON, and graph models), but these databases adopt diverse methods to implement their data storage and query, which results in a heavy burden for novices to use multi-model databases. This is because there is no unified standard of multi-model query languages (like SQL). Users have to master different query languages to operate corresponding multi-model databases. And users also need to know the complicated and probably evolving schema of multi-model data as background knowledge for writing the proper query statements.

Considering these situations, we present our first research topic - how to employ the keyword searches method as an alternative way to explore and query multi-model databases. The reason is that empowering users to access multi-model databases with simple keywords can relieve users from the steep learning curve of mastering query languages and schemas of multi-model data. Besides, compared with the mature and robust relational databases dominating the current market, multi-model databases - could not yet match them in transaction management, query optimization, security, etc. - still need time to perfect their foundations of the mathematic theory and boost performance. Considering this, we present our second research topic - how to use relational databases as an alternative way to store and query well-structured data and NoSQL data uniformly.

For the first research problem, we utilize the probabilistic formalism of quantum physics to bring the problem into vector spaces and exploit non-classical probabilities to find top-k the most relevant results, in which each result may consist of multiple components - from different data models - corresponding to pertinent information. In this process, we apply the quantum language model to represent events (e.g., words) as subspaces, employ density matrices to encapsulate all the information over these subspaces, and use these density matrices to measure the divergence between a query and candidate results. Moreover, we propose the density vector by analyzing the quantum language model to reduce computation complexity. To construct density vectors, we propose using spatial pattern mining technology to identify superposition events (i.e., compounds) for improving method accuracy. We also make use of the Principle Component Analysis (PCA) method to further improve the efficiency of keyword searches over multi-model databases by reducing query calculation costs. Now, we could make keyword searches over multi-model databases work.

As for the second research topic, it requires designing a good relational schema to store these various data in relational databases. But the challenge is that we need to address the difference of structure between flat relational tables and complex multi-model data. To address this problem, we review all relevant works, analyze existing methods, and give a literature review. As a result, we find these works focusing on handling one single data model by relational databases. There is no relevant research to handle multi-model data. Against this challenge, we prepare to employ the reinforcement learning method. This is because this method could automatically obtain an excellent relational schema from the given multi-model data and queries by interacting with the outer environment. To make this idea work in the field of databases, we define the input, goal, reward, policy, and observation according to our purpose, respectively. Besides, we present a Double Q-tablesalgorithm to assist in decreasing the complexity of the learning process.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-8126-8.

Printed copies will be available on request from Gongsheng Yuan: gongsheng.yuan@helsinki.fi