Nue Routing - Deadlock-Free Routing within VL Constraints for InfiniBand/OpenSM
The Nue routing algorithm [1] is a novel topology-agnostic routing approach which implicitly avoids deadlocks during the path calculation instead of solving both problems separately. Nue routing heuristically optimizes the load balancing, i.e., the number of routes per link, while enforcing deadlock-freedom without exceeding a given number of virtual lanes (VLs). Our Nue implementation for the InfiniBand subnet manager supports any number of virtual lanes and can be downloaded and tested:- opensm-3.3.19-nue.tar.gz - (1.3 MB) based on OpenSM v3.3.19 (for reproducing results in [1])
Installing the patched OpenSM
tar xzf ./opensm-3.3.19-nue.tar.gz
# installing OpenSM in home directory without overwriting a default osm
cd ./opensm-3.3.19-nue
export CONFIG_OPTS="--prefix=${HOME}/ofed"
# it is highly recommended to use Nue together with the METIS library (we used v5.1.0 during development)
export CONFIG_OPTS+=" --enable-metis"
# if METIS in not installed in the default directory, then one might need to add the following
export CONFIG_OPTS+=" --with-metis-includes=${path-to-metis}/includes"
export CONFIG_OPTS+=" --with-metis-libs=${path-to-metis}/lib"
./autogen.sh && ./configure ${CONFIG_OPTS}
make
make install
# installing OpenSM in home directory without overwriting a default osm
cd ./opensm-3.3.19-nue
export CONFIG_OPTS="--prefix=${HOME}/ofed"
# it is highly recommended to use Nue together with the METIS library (we used v5.1.0 during development)
export CONFIG_OPTS+=" --enable-metis"
# if METIS in not installed in the default directory, then one might need to add the following
export CONFIG_OPTS+=" --with-metis-includes=${path-to-metis}/includes"
export CONFIG_OPTS+=" --with-metis-libs=${path-to-metis}/lib"
./autogen.sh && ./configure ${CONFIG_OPTS}
make
make install
Running the patched OpenSM with Nue:
cd $HOME/ofed/sbin
./opensm -R nue
./opensm -R nue
↪ for additional parameters look at ./opensm --help
Limiting the number of used VLs can be achieved (e.g. #VL=4):
cd $HOME/ofed/sbin
./opensm -R nue --nue_max_num_vls=4
./opensm -R nue --nue_max_num_vls=4
↪ nue_max_num_vls should be greater or equal 1 (default: max. avail. #VL)
Run Nue with modified QoS file (to configure SL/VL load):
cd $HOME/ofed/sbin
./opensm -R nue --nue_max_num_vls=8 --qos_policy_file ${HOME}/ofed/qos.conf
./opensm -R nue --nue_max_num_vls=8 --qos_policy_file ${HOME}/ofed/qos.conf
Possible content of qos.conf file:
# Enable QoS setup
qos TRUE
# QoS default options
qos_max_vls 8
qos_high_limit 4
qos_vlarb_high 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64
qos_vlarb_low 0:4,1:4,2:4,3:4,4:4,5:4,6:4,7:4
qos_sl2vl 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
qos TRUE
# QoS default options
qos_max_vls 8
qos_high_limit 4
qos_vlarb_high 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64
qos_vlarb_low 0:4,1:4,2:4,3:4,4:4,5:4,6:4,7:4
qos_sl2vl 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
↪ for explanation see opensm-3.3.19-nue/doc/QoS_management_in_OpenSM.txt
Nue routing was developed by Jens Domke at the Matsuoka Lab (TiTech) and implemented at ZIH (TU Dresden), and the scientific work was advised by Torsten Hoefler and Satoshi Matsuoka.
References
[1] J. Domke, T. Hoefler, S. Matsuoka: | ||
Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing
In Proceedings of the 25th Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Jun. 2016, (acceptance rate: 16% (20/129)) |
[2] J. Domke, T. Hoefler, W. Nagel: | ||
Deadlock-Free Oblivious Routing for Arbitrary Topologies
In Proceedings of the 25th IEEE International Parallel \& Distributed Processing Symposium (IPDPS), presented in Anchorage, AL, USA, pages 613--624, IEEE Computer Society, ISBN: 0-7695-4385-7, May 2011, (acceptance rate: 19.6%, 112/571) |
[3] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
Optimized Routing for Large-Scale InfiniBand Networks
In 17th Annual IEEE Symposium on High Performance Interconnects (HOTI 2009), presented in New York, NY, Aug. 2009, |