The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Publications of SPCL
|T. Schneider, O. Bibartiu, T. Hoefler:|
|Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks|
(In Proceedings of the 24th Annual Symposium on High-Performance Interconnects (HOTI'16), Aug. 2016)
Best Student Paper at HOTI'16
AbstractLossless networks, such as InfiniBand use flow-control to avoid packet-loss due to congestion. This introduces dependencies between input and output channels, in case of cyclic dependencies the network can deadlock. Deadlocks can be resolved by splitting a physical channel into multiple virtual channels with independent buffers and credit systems. Currently available routing engines for InfiniBand assign entire paths from source to destination nodes to different virtual channels. However, InfiniBand allows changing the virtual channel at every switch. We developed fast routing engines which make use of that fact and map individual hops to virtual channels. Our algorithm imposes a total order on virtual channels and increments the virtual channel at every hop, thus the diameter of the network is an upper bound for the required number of virtual channels. We integrated this algorithm into the InfiniBand software stack. Our algorithms provide deadlock free routing on state-of-the-art low-diameter topologies, using fewer virtual channels than currently available practical approaches, while being faster by a factor of four on large networks. Since low-diameter topologies are common among the largest supercomputers in the world, to provide deadlock-free routing for such systems is very important.