Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Publications of SPCL

J. Domke, T. Hoefler:

 Scheduling-Aware Routing for Supercomputers

(Nov. 2016, Accepted at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) )

Abstract

The interconnection network has a large influence on total cost, application performance, energy consumption, and overall system efficiency of a supercomputer. Unfortunately, today’s routing algorithms do not utilize this important resource most efficiently. We first demonstrate this by defining the dark fiber metric as a measure of unused resource in networks. To improve the utilization, we propose scheduling-aware routing, a new technique that uses the current state of the batch system to determine a new set of network routes and so increases overall system utilization by up to 17.74%. We also show that our proposed routing increases the throughput of communication benchmarks by up to 17.6% on a practical InfiniBand installation. Our routing method is implemented in the standard InfiniBand tool set and can immediately be used to optimize systems. In fact, we are using it to improve the utilization of our production petascale supercomputer for more than one year.

Documents

download article:
download slides:
 

BibTeX

@inproceedings{,
  author={Jens Domke and Torsten Hoefler},
  title={{Scheduling-Aware Routing for Supercomputers}},
  year={2016},
  month={11},
  note={Accepted at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16)},
}