Nue Routing - Deadlock-Free Routing within VL Constraints for InfiniBand/OpenSM

The Nue routing algorithm [1] is a novel topology-agnostic routing approach which implicitly avoids deadlocks during the path calculation instead of solving both problems separately. Nue routing heuristically optimizes the load balancing, i.e., the number of routes per link, while enforcing deadlock-freedom without exceeding a given number of virtual lanes (VLs). Our Nue implementation for the InfiniBand subnet manager supports any number of virtual lanes and can be downloaded and tested:
This implementation should be usable on any InfiniBand-based network. However, the algorithm in osm_ucast_nue.c should easily be portable to similar network technologies, such as Intel OmniPath, or used as template for other technologies, e.g., Converged Enhanced Ethernet or on-chip networks.

Installing the patched OpenSM
tar xzf ./opensm-3.3.19-nue.tar.gz
# installing OpenSM in home directory without overwriting a default osm
cd ./opensm-3.3.19-nue
export CONFIG_OPTS="--prefix=${HOME}/ofed"

# it is highly recommended to use Nue together with the METIS library (we used v5.1.0 during development)
export CONFIG_OPTS+=" --enable-metis"
# if METIS in not installed in the default directory, then one might need to add the following
export CONFIG_OPTS+=" --with-metis-includes=${path-to-metis}/includes"
export CONFIG_OPTS+=" --with-metis-libs=${path-to-metis}/lib"

./autogen.sh && ./configure ${CONFIG_OPTS}
make
make install

Running the patched OpenSM with Nue:
cd $HOME/ofed/sbin
./opensm -R nue

for additional parameters look at ./opensm --help

Limiting the number of used VLs can be achieved (e.g. #VL=4):
cd $HOME/ofed/sbin
./opensm -R nue --nue_max_num_vls=4

nue_max_num_vls should be greater or equal 1 (default: max. avail. #VL)

Run Nue with modified QoS file (to configure SL/VL load):
cd $HOME/ofed/sbin
./opensm -R nue --nue_max_num_vls=8 --qos_policy_file ${HOME}/ofed/qos.conf

Possible content of qos.conf file:
# Enable QoS setup
qos TRUE
# QoS default options
qos_max_vls 8
qos_high_limit 4
qos_vlarb_high 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64
qos_vlarb_low 0:4,1:4,2:4,3:4,4:4,5:4,6:4,7:4
qos_sl2vl 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7

for explanation see opensm-3.3.19-nue/doc/QoS_management_in_OpenSM.txt

Nue routing was developed by Jens Domke at the Matsuoka Lab (TiTech) and implemented at ZIH (TU Dresden), and the scientific work was advised by Torsten Hoefler and Satoshi Matsuoka.

References

HPDC'16
[1] J. Domke, T. Hoefler, S. Matsuoka:
 Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing In Proceedings of the 25th Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Jun. 2016, (acceptance rate: 16% (20/129))
IPDPS'11
[2] J. Domke, T. Hoefler, W. Nagel:
 Deadlock-Free Oblivious Routing for Arbitrary Topologies In Proceedings of the 25th IEEE International Parallel \& Distributed Processing Symposium (IPDPS), presented in Anchorage, AL, USA, pages 613--624, IEEE Computer Society, ISBN: 0-7695-4385-7, May 2011, (acceptance rate: 19.6%, 112/571)
HotI'09
[3] T. Hoefler, T. Schneider, A. Lumsdaine:
 Optimized Routing for Large-Scale InfiniBand Networks In 17th Annual IEEE Symposium on High Performance Interconnects (HOTI 2009), presented in New York, NY, Aug. 2009,