CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

Date and time: 
Thursday, May 4, 2017 - 15:30
220 Deschutes
Hongqiang "Harry" Liu
Microsoft Research
  • Lei Jiao


Picking the right cloud configuration for big data analytics jobs running in clouds is hard, because there can be tens of possible VM instance types and even more cluster sizes to pick from. Choosing poorly can significantly degrade performance and increase the cost to run a job by 2-3x on average, and as much as 12x in the worst-case. However, it is challenging to automatically identify the best configuration for a broad spectrum of applications and cloud configurations with low search cost. CherryPick is a system that leverages Bayesian Optimization to build performance models for various applications, and the models are just accurate enough to distinguish the best or close-to-the-best configuration from the rest with only a few test runs. Our experiments on five analytic applications in AWS EC2 show that CherryPick has a 45-90% chance to find optimal configurations, otherwise near-optimal, saving up to 75% search cost compared to existing solutions.


Hongqiang "Harry" Liu is a Researcher in Mobility and Networking Research Group at Microsoft Research, Redmond. He passed his Ph.D. thesis defense at the Department of Computer Science, Yale University on May 7th, 2014. His advisor was Prof. David Gelernter.

Previously he received the B.S and M.S. degrees from Tsinghua University, Beijing, China in 2007 and 2010, respectively. His advisor is Prof. Xing Li.

His research interest includes traffic engineering (TE), software-defined networking (SDN), data center networks (DCN), content delivery networks (CDN), peer-to-peer (P2P) applications and cloud infrastructures for BigData.