The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Publications of SPCL
|High-performance distributed memory systems – from supercomputers to data centers|
(Presentation - presented in virtual, Oct. 2020, )
Keynote talk at the 2020 International Symposium on DIStributed Computing (DISC)
AbstractWe will cover distributed memory programming of high-performance supercomputers and datacenter computers. Starting from the Message Passing Interface, we observe abstractions for distributed computations that we carry through optimizations such as topology mapping and collective communication optimization. We then discuss efficient correction protocols to enable fault tolerance in such high-performance distributed systems. Armed with these insights, we observe that supercomputers are likely to migrate into megadatacenter installations leading to a general convergence of such architectures. The first step, converging the network interfaces, is well underway towards a general acceptance of Remote Direct Memory Access (RDMA) networking. RDMA moves the distributed system closer to shared memory, with a weakly consistent memory model. We discuss several algorithmic and systems approaches to accelerate distributed replicated state machines, databases, and locking systems by orders of magnitude using RDMA. Finally, if time allows, we will outline parametric program graphs – a sound abstraction for analyzing and optimizing applications. Each topic will identify open problems and provide ideas for further work to deepen our understanding of high-performance distributed memory systems.
Recorded talk (best effort)