Empirical Performance Analysis of HPC Applications with Portable Hardware Counter Metrics

Brian Gravelle
Date and time: 
Tue, May 3 2022 - 3:00pm
Location: 
Remote
Speaker(s):
Brian Gravelle
University of Oregon
Host/Committee: 
  • Boyana Norris (Chair)
  • Hank Childs
  • Allen Malony
  • Diego Melgar Moctezuma (Earth Sciences)

In this dissertation, we demonstrate that it is possible to develop methods of empirical hardware counter-based performance analysis for scientific applications running on diverse types of CPUs. Although hardware counters have been used in performance analysis for at least 30 years, the methods used are still limited to particular CPU vendors or even particular generations of CPUs from the same vendor. Our motivating hypothesis was that hardware counter-based measurements could be developed to provide consistent metrics on diverse CPU types. This dissertation proves the hypothesis was correct by demonstrating one such set of metrics.

We begin with an introduction motivating empirical performance analysis on CPUs, followed by a background on empirical performance analysis. This background includes the Roofline performance model which is a widely used to visualize the performance of scientific applications relative to the potential performance of the system in use. The Roofline Model uses metrics that are easily portable to different CPU architectures, so it is a useful starting point for our efforts to develop portable hardware counter metrics. We contribute to existing Roofline literature by presenting hardware counter metrics which can measure the required ii application data on two different CPUs and by presenting two benchmarks to produce the Roofline model of the CPU. These contributions are complimentary since the benchmarks can also be used to validate the hardware counters used to measure the application data.

Building on this work, we present a set of additional performance metrics derived from Hardware Performance Monitors that we have been able to replicate on CPUs from two separate vendors. We developed these metrics to focus on information that can inform developers about the performance of the algorithms and data structures in their application. This method contrasts with other hardware counter methods which are aimed at particular microarchitectural features. These metrics allow the users to understand the performance of application from the same perspective on multiple CPUs.

We use a series of case studies to explore the usefulness of our new metrics, and to validate that the measured values provide the expected information about the application on both of our test systems. The first set of case studies examines a series of benchmarks and mini-applications. These computational kernels have a variety of performance features which we explore using the new hardware counter metrics. Finally, we study the performance of several versions of a scientific application using a combination of the Roofline model and the new metrics.