VLDB 2019 Tutorial

Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems







Time: 11:00-12:30 August 28 Wednesday 2019

Place: Los Angeles, California, USA

Abstract:

Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics

Outline:

The tutorial is planned for 1.5 hours and will have the following structure:

Part 1:

Motivation (10'). We motivate the need for automatic parameter tuning with several applications/scenarios in the era of Big Data and cloud environment.

History and classification (10'). We introduce the history and classification of parameter tuning approaches.

Pertinent references:

  • Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, Chen Wang: MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs. PVLDB 7(13): 1319-1330 (2014) [PDF]

Part 2

Parameter tuning on Databases (20') We introduce six approaches to tune performance on database systems.

Pertinent references:

  • Dias, K., Ramacher, M., Shaft, U., Venkataramani, V., & Wood, G.. Automatic Performance Diagnosis and Tuning in Oracle. In CIDR (pp. 84-94), 2005 [PDF]
  • Schnaitter, K., Abiteboul, S., Milo, T., & Polyzotis, N. Colt: continuous on-line tuning. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (pp. 793-795). ACM, 2006. [PDF]
  • Duan, Songyun, Vamsidhar Thummala, and Shivnath Babu. "Tuning database configuration parameters with iTuned." Proceedings of the VLDB Endowment 2.1: 1246-1257, 2009. [PDF]
  • Van Aken, D., Pavlo, A., Gordon, G. J., & Zhang, B.. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data(pp. 1009-1024). ACM, 2017. [PDF]
  • Zhang, J., Liu, Y., Zhou, K., Li, G., Xiao, Z., Cheng, B., & Ran, M. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data (pp. 415-432). ACM, 2019. [PDF]
  • Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, Rui Zhang: iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases. Proceedings of the VLDB 2019, 1221 - 1234. [PDF]

Part 3

Parameter tuning on Hadoop and Spark (35'). We introduce key approaches to tune performance on Hadoop MapReduce and Spark. We compare the solutions in various tuning categories.

Pertinent references:

  • Shivnath Babu. Towards Automatic Optimization of MapReduce Programs. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC), pages 137–142. ACM, 2010. [PDF]
  • Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou. Improving MapReduce Performance in Heterogeneous Environments with Adaptive Task Tuning. In Proceedings of the 15th International Middleware Conference pages 97–108. ACM, 2014 [PDF]
  • Anastasios Gounaris, Georgia Kougka, Ruben Tous, Carlos Tripiana Montes, and Jordi Torres. Dynamic Configuration of Partitioning in Spark Applications. IEEE Transactions on Parallel and Distributed Systems (TPDS), 28(7):1891–1904, 2017 [PDF]
  • Herodotos Herodotou and Shivnath Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. Proceedings of the VLDB Endowment, 4(11):1111–1122, 2011. [PDF]
  • Palden Lama and Xiaobo Zhou. AROMA: Automated Resource Allocation and Configuration of MapReduce Environment in the Cloud. In Proceedings of the 9th international conference on Autonomic computing (ICAC), pages 63–72. ACM, 2012. [PDF]

Part 4

Open problem and challenges (15'). We discuss some real applications and systems for automatic tuning, such as Self-driving Oracle Database, Self-tuning DB2 and Unravel platform. We conclude with a discussion of open problems and challenges.

Presenters:

Jiaheng Lu is an Associate Professor at the University of Helsinki, Finland. His main research interests lie in the big data management and database systems, and specifically in the challenge of efficient data processing from real-life, massive data repository and Web. He has written four books on Hadoop and NoSQL databases, and more than 70 journal and conference papers published in SIGMOD, VLDB, TODS, and TKDE, etc.

Yuxing Chen is a doctoral student at the University of Helsinki. His research topics are parameter tuning on Big Data systems and multi-model query optimization.

Herodotos Herodotou is an Assistant Professor at the Cyprus University of Technology (CUT). His research interests are in large-scale data processing systems, database systems, and cloud computing. In particular, his work focuses on automated performance tuning of both centralized and distributed data-intensive computing systems. His Ph.D. dissertation work on the Starfish platform received the SIGMOD Jim Gray Doctoral Dissertation Award Honorable Mention as well as the Outstanding Ph.D. Dissertation Award in Computer Science at Duke.

Shivnath Babu is the CTO at Unravel Data Systems and an Adjunct Professor at Duke University. His research focuses on ease-of-use and manageability of data-intensive systems, automated problem diagnosis, and cluster sizing for applications running on cloud platforms. Shivnath cofounded Unravel to solve the application management challenges that companies face when they adopt systems like Hadoop and Spark. Unravel originated from the Starfish platform built at Duke, which has been downloaded by over 100 companies. Shivnath has won a US National Science Foundation CAREER Award, three IBM Faculty Awards, and an HP Labs Innovation Research Award.

Selected publications of presenters:

Duan, Songyun, Vamsidhar Thummala, and Shivnath Babu. "Tuning database configuration parameters with iTuned." Proceedings of the VLDB Endowment 2.1 (2009): 1246-1257. [PDF]

Babu, S., Borisov, N., Duan, S., Herodotou, H., & Thummala, V. (2009, May). Automated Experiment-Driven Management of (Database) Systems. In HotOS. [PDF]

Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., & Babu, S. (2011, January). Starfish: A Self-tuning System for Big Data Analytics. In CIDR (Vol. 11, No. 2011, pp. 261-272). [PDF]

Herodotou, Herodotos, and Shivnath Babu. "Profiling, what-if analysis, and cost-based optimization of mapreduce programs." Proceedings of the VLDB Endowment 4.11 (2011): 1111-1122. [PDF]

Shi, J., Zou, J., Lu, J., Cao, Z., Li, S., & Wang, C. (2014). MRTuner: a toolkit to enable holistic optimization for mapreduce jobs. Proceedings of the VLDB Endowment, 7(13), 1319-1330. [PDF]