Online Monitoring, Analysis, and Feedback for High-Performance Computing Systems

Chad Wood
Date and time: 
Tue, Feb 2 2021 - 3:00pm
University of Oregon
  • Allen Malony (Chair)
  • Boyana Norris
  • Hank Childs

In this work we explore the area of online monitoring, analysis, and feedback systems in high-performance computing. This area of research is increasingly important as software and machines grow in scale and architectural complexity. We begin by outlining the terms of the art and scope of the area being considered. We provide a high-level overview of online monitoring, analysis, and feedback within the context of high-performance computing. Significant features of each subtopic are discussed, as well as the reasoning behind the integration of these topics into a holistic area of research. This leads into a deeper discussion of the special constraints imposed by high-performance computing, and how various solutions have evolved along with this unique computational landscape. We then provide a survey of the current and prior tools and techniques for online monitoring, analysis, and feedback. Finally, we end this with a discussion of our ongoing research and open areas for future efforts in this domain.