Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Publications of SPCL

A. Nigay, T. Schneider, T. Hoefler:

 TinyMPI tasking prototype

(, Feb. 2019, )


This document describes TinyMPI, a prototype MPI implementation which follows the non- traditional approach of virtualizing MPI, and presents the results of a research effort which employed TinyMPI as a research vehicle. Traditional MPI implementations run at most one MPI rank per CPU core. TinyMPI runs more than one MPI rank per CPU core, i.e., it oversubscribes the CPU, with the goal of achieving automatic computation-communication overlap: when one MPI rank blocks, TinyMPI switches to another and continues using the CPU. We have used TinyMPI as a tool in the research effort of answering the question, “How many ranks exactly to start on each CPU core?”, the results of which — in the form of a model — are presented in this document along with a description of TinyMPI’s internals. TinyMPI supports the Arm architecture and is deployed on the Dibona cluster.


download article:


  author={Alexandr Nigay and Timo Schneider and Torsten Hoefler},
  title={{TinyMPI tasking prototype}},