Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Publications of SPCL

K. Lakhotia, K. Isham, L. Monroe, M. Besta, T. Hoefler, F. Petrini:

 In-network Allreduce with Multiple Spanning Trees on PolarFly

(In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'23), presented in Orlando, FL, USA, pages 165–176, Association for Computing Machinery, ISBN: 9781450395458, Jun. 2023)

Publisher Reference


Allreduce is a fundamental collective used in parallel computing and distributed training of machine learning models, and can become a performance bottleneck on large systems. In-network computing improves Allreduce performance by reducing packets on the fly using network routers. However, the throughput of current in-network solutions is limited to a single link bandwidth. We develop, compare and contrast two different sets of Allreduce spanning trees embedded into PolarFly, a high-performance diameter-2 network topology. Both of our solutions offer theoretically guaranteed near-optimal performance, boosting Allreduce bandwidth by a factor equal to half the network radix of nodes. While our first set offers low-latency with trees of depth-3, the second set offers congestion-free implementation which reduces complexity and resource requirements of in-network computing units. In doing so, we also distinguish PolarFly as a highly suitable network for distributed deep learning and other applications that employ throughput-bound large Allreductions.


download article:


  author={Kartik Lakhotia and Kelly Isham and Laura Monroe and Maciej Besta and Torsten Hoefler and Fabrizio Petrini},
  title={{In-network Allreduce with Multiple Spanning Trees on PolarFly}},
  booktitle={Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'23)},
  location={Orlando, FL, USA},
  publisher={Association for Computing Machinery},