John Mellor-Crummey
The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues
with *John Mellor-Crummey**of **Rice University* presenting on
*Measurement and Analysis of Application Performance on GPU-accelerated
Systems at Exascale*. Everyone is welcome to attend (over Zoom)!
*When:* Thursday, 13th March, 6PM CET
*Where:* Zoom
Join <https://spcl.inf.ethz.ch/Bcast/join>
*Abstract:* As part of the US DOE's Exascale Computing Project, Rice
University began extending its HPCToolkit performance tools to support
instruction-level measurement and analysis of applications executing on
GPU-accelerated exascale supercomputers. Hardware support for
instruction-level performance measurement in AMD, Intel, and NVIDIA GPUs
was developed at the urging of the HPCToolkit project team. HPCToolkit
employs PC sampling or binary instrumentation to perform
instruction-level measurements of GPU computations. When measuring a
GPU-accelerated application, HPCToolkit employs a novel wait-free data
structure to communicate performance measurements between tool threads
and application threads. To help attribute performance information in
detail, HPCToolkit performs parallel analysis of large CPU and GPU
binaries involved in the execution of an exascale application to rapidly
recover mappings between machine instructions and source code. To
analyze terabytes of performance measurements gathered during executions
at exascale, HPCToolkit employs distributed-memory parallelism,
multithreading, sparse data structures, and out-of-core streaming
analysis algorithms. To support interactive exploration of profiles up
to terabytes in size, HPCToolkit's hpcviewer GUI uses out-of-core
methods to visualize performance data. These strategies have enabled
HPCToolkit to efficiently measure, analyze and explore terabytes of
performance data for executions using as many as 64K MPI ranks and 64K
GPU tiles on ORNL's Frontier supercomputer. This talk will describe key
aspects of HPCToolkit, successes analyzing applications, and some
challenges ahead.
*Biography:* John Mellor-Crummey is a Professor of Computer Science at
Rice University in Houston, TX, USA. His principal research focus at
present is tools for measurement and analysis of application
performance. His past work includes scalable synchronization algorithms
for shared-memory multiprocessors, compilers and runtime systems for
parallel computing, techniques for execution replay of parallel
programs, tools for dynamic data race detection, and techniques for
network performance analysis and optimization. Mellor-Crummey co-led
development of the OMPT tools interface for OpenMP 5. He is a
co-recipient of the 2006 Dijkstra Prize in Distributed Computing,
co-recipient of a 2024 Honor Award from the US Secretary of Energy, and
a Fellow of the ACM.
More details & future talks <https://spcl.inf.ethz.ch/Bcast/>
Scalable Parallel Computing Lab (SPCL)
Department of Computer Science, ETH Zurich
Website <https://spcl.inf.ethz.ch> X(Twitter)
<https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl>
GitHub <https://github.com/spcl>