Strength and Limitations of Reinforcement Learning and Monte Carlo Methods for Generating Pathological Performance Test Cases

Ziyad Alsaeed
Date and time: 
Wed, Oct 27 2021 - 10:00am
Ziyad Alsaeed
University of Oregon
  • Michal Young (Chair)
  • Stephen Fickas
  • Christopher Wilson
  • Julie Hessler (History)

Detecting and repairing software performance issues requires test cases that demonstrate those problems. The quality and availability of test cases play an instrumental role in applications performance testing. Worst-case complexity edge cases often escape developers' understanding as the size and complexity of the application grow. Research shows that feedback-directed search (mutational fuzzing) can effectively discover pathological inputs that expose performance issues, but blindly mutating byte strings slows search by producing mostly invalid inputs. The search can be accelerated for applications that accept richly structured textual input by adapting search techniques with grammar-based generation. Monte Carlo tree search (MCTS, a random sampling search method) and reinforcement learning (RL, a machine learning-based technique to learn through environment interactions) are two unexplored paths in the domain of pathological input generation.

MCTS and RL have been applied to different domains, with notable success in the game domain. Adapting these techniques to search for inputs that trigger slow processing in a diverse set of applications poses different challenges. Primarily, adapting feedback-directed techniques from game search to a domain with widely varying rewards jeopardizes the search effort by skewing toward initial and mostly trivial observations. We devise different adaptive reward functions that perform well despite the diversity in application cost ranges and grammars. Other challenges vary depending on the applied search technique (e.g., the quality of the state representation). We overcome each challenge by applying a dynamic solution that requires no user involvement or exploring different paths to understand their trade-offs.

We construct and evaluate the application of MCTS and RL on two different techniques (TreeLine and PerfRL). The core tool for instrumentation and testing as used in the state-of-the-art fuzzing techniques allows us to evaluate our contributions effectively. Results on a mix of real-world applications show that our implementation of TreeLine is up to several times faster than a mutational fuzzer at finding expensive inputs for applications with richly structured input (e.g., graph layout) and only modestly slower on applications with unstructured textual input (e.g., word frequency). In general, we can discover bounded-length inputs that trigger exceptionally slow processing in the target applications within few minutes.