SPCL - Publications

Conference Rankings

Publications of SPCL

Principles for Coordinated Optimization of Computation and Communication in Large-Scale Parallel Systems

(presented in Bloomington, IN, USA, Sep. 2008)

Abstract

Solving complex scientific problems with massively parallel distributed computer systems remains a complex task. Data-dependencies in parallel applications enforce communication in addition to the computation of the solution. We aim to optimize communication and computation to ideally play together and minimize all overheads. Our work is based on the hypothesis that global knowledge about communication patterns and techniques to overlap communication and computation can be combined to decrease communication overhead and thus increase the scalability of parallel algorithms. We therefore analyze the communication patterns of several real-world applications and assess the detailed parameters of a particular interconnection network. Based on our findings and application analyses, we propose two new classes of collective operations: non-blocking collectives and topological (sparse) collectives. Theoretical analyses supported by an optimized implementation support our hypothesis. We propose different techniques to apply the newly defined operations to different application kernels, a parallel programming scheme and finally two real-world applications. Practical experiments show that the new operations can be used to reduce the communication overhead of the application kernels by up to 92%. The proposed pipelining techniques can be used as an abstract mechanism to achieve higher performance for many algorithms. We show reasonable performance improvements for the two analyzed real-world applications. We have also been able to simplify the interface in case of sparse communication patterns. However, we conclude that more fundamental changes to the algorithms and programming models are necessary to take full advantage of the new semantics. We expect that our results will influence programming models and parallel algorithm design. We conjecture that programmers and algorithm designers will need to combine intelligent layout of communication patterns and overlapping of computation and communication in order to achieve high performance and scalability.

Documents

download article:

BibTeX

@phdthesis{hoefler-dissertation,
  author={Torsten Hoefler},
  title={{Principles for Coordinated Optimization of Computation and Communication in Large-Scale Parallel Systems}},
  institution={Indiana University},
  year={2008},
  month={9},
  location={Bloomington, IN, USA},
}