The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Publications of SPCL
|Principles for Coordinated Optimization of Computation and Communication in Large-Scale Parallel Systems|
(presented in Bloomington, IN, USA, Sep. 2008)
AbstractSolving complex scientific problems with massively parallel distributed computer systems remains a complex task. Data-dependencies in parallel applications enforce communication in addition to the computation of the solution. We aim to optimize communication and computation to ideally play together and minimize all overheads. Our work is based on the hypothesis that global knowledge about communication patterns and techniques to overlap communication and computation can be combined to decrease communication overhead and thus increase the scalability of parallel algorithms. We therefore analyze the communication patterns of several real-world applications and assess the detailed parameters of a particular interconnection network. Based on our findings and application analyses, we propose two new classes of collective operations: non-blocking collectives and topological (sparse) collectives. Theoretical analyses supported by an optimized implementation support our hypothesis. We propose different techniques to apply the newly defined operations to different application kernels, a parallel programming scheme and finally two real-world applications. Practical experiments show that the new operations can be used to reduce the communication overhead of the application kernels by up to 92%. The proposed pipelining techniques can be used as an abstract mechanism to achieve higher performance for many algorithms. We show reasonable performance improvements for the two analyzed real-world applications. We have also been able to simplify the interface in case of sparse communication patterns. However, we conclude that more fundamental changes to the algorithms and programming models are necessary to take full advantage of the new semantics. We expect that our results will influence programming models and parallel algorithm design. We conjecture that programmers and algorithm designers will need to combine intelligent layout of communication patterns and overlapping of computation and communication in order to achieve high performance and scalability.