CSUR survey paper published

One survey paper "A Survey on Automatic Parameter Tuning for Big Data Processing Systems" has been published in ACM Computing Surveys (CSUR).

Herodotou, Herodotos, Yuxing Chen, and Jiaheng Lu. "A Survey on Automatic Parameter Tuning for Big Data Processing Systems." ACM Computing Surveys (CSUR) 53.2 (2020): 1-37.

Abstract Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.

Links: [open access]  [related VLDB tutorial]