DF-DN - Ensuring Deadlock-Freedom in Low-Diameter InfiniBand networks

The DF-DN [1] provides deadlock-freedom for low-diameter InfiniBand networks. It uses the idea of incrementing the VL on every hop.

Changing the VL within a switch is a feature of the InfiniBand network that is not exploited by deadlock-free routing algorithms currently available in InfiniBand. Instead currently available algorithms rely on "layering". If the channel-dependency graph induced by the routing function contains a cycle, entire paths from source to destination HCA are moved into a new VL (initially all paths are in e.g., VL 0). This process is repeated for each VL until there is no cycle left in any VL. This approach has some drawbacks: Finding the optimal assignment of paths to VLs (such that the number of used VLs is minimized) is NP-complete [2]. There is no known upper bound on the number of necessary VLs. If the network uses more than one VL, injecting hosts have to utilize PathRecord queries in order to find out which SL value they should use for each destination.

Our heuristic instead is based on the idea of incrementing the VL at every hop, thus there is a trivial lower bound on the number of required VLs, which is the diameter of the network (assuming minimal path routing). Also our heuristic has lower complexity than DFSSSP and LASH, and is much faster in practice. Low-diameter networks are popular since they offer high bandwidth and low latency. For diameter two networks, e.g., Slim Fly, the DF-D2 heuristic does not require path queries.

We evaluated the DF-DN and DF-D2 heuristic for several low-diameter topologies. Our experiments show that our heuristic uses a smaller number of VLs for many networks than deadlock-free routing algorithms currently available in OpenSM.

Our algorithm is up to three times faster than DF-SSSP and orders of magnitudes faster than LASH.

The DF-DN toolchain, which includes a patched version of OpenSM as well as a patched version of ib_flit_sim can be downloaded here.

References

HOTI'16
[1] T. Schneider, O. Bibartiu, T. Hoefler:
 Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks In Proceedings of the 24th Annual Symposium on High-Performance Interconnects (HOTI'16), Aug. 2016, Best Student Paper at HOTI'16
IPDPS'11
[2] J. Domke, T. Hoefler, W. Nagel:
 Deadlock-Free Oblivious Routing for Arbitrary Topologies In Proceedings of the 25th IEEE International Parallel \& Distributed Processing Symposium (IPDPS), presented in Anchorage, AL, USA, pages 613--624, IEEE Computer Society, ISBN: 0-7695-4385-7, May 2011, (acceptance rate: 19.6%, 112/571)