DDTBench: Micro-Applications for Communication Data Access Patterns and MPI Datatypes

DDTBench is a suite of Micro-Apps that captures how parallel scientific applications from many different fields of science access the data that they send and receive between processes. MPI Derived Datattypes (DDTs) allow to specify those access patterns in such a way that no explicit copy operation is needed, in contrast to the pack-unpack loops found in many codes. In DDTBench we compare the packing overhead incurred by such loops to that of MPI DDTs. This is done by performing a ping-pong benchmark, once using MPI DDTs to specify how data should be packed and once using the pack-unpack loops that we found in the applications. The measurement loop of the benchmark is shown below:


benchmark_expl.png
Measurement loop of DDTBench. Measurements are taken on process 0, no global clock is not required.


Using the times it takes to perform each operation (colored block in the picture above) we can calculate the overhead for packing/unpacking data with both methods. Of course we can not measure this overhead directly in the case MPI DDTs are used, because data re-packing is implicit. But we can calculate the time used for transferring packed data, t_net, by subtracting the time required for manual packing and unpacking from the round-trip time of the ping-pong with manual packing. Now the data-repacking overhead for both cases can be calculated by subtracting t_net from the ping-pong round trip time and dividing the result by the ping pong round trip time. We did this for some of the micro-apps in the graph shown below:


packing_costs.png
Packing costs for different test cases


It can be seen that MPI DDTs can reduce the overhead associated with data-packing (i.e., from 40% to 15% in the case of NAS_LU_x, where a contiguous array is needlessly copied by the original code). The large difference between the performance delivered by Open MPIs DDT engine compared to that of MVAPICH shows that there is still some work to be done in improving MPI DDT implementations. We hope that DDTBench can server implementers as a guideline on which access patterns deserve special attention. A list of the micro-apps included in DDTBench can be found in the table below.


Application Class Testname Access Pattern
Atmospheric Science WRF_x_vec struct of 2D/3D/4D face exchanges in different directions (x,y), using different (semantically equivalent) datatypes: nested vectors (_vec) and subarrays (_sa)
WRF_y_vec
WRF_x_sa
WRF_y_sa
Quantum Chromodynamics MILC_su3_zd 4D face exchange, z direction, nested vectors
Fluid Dynamics NAS_MG_x 3D face exchange in each direction (x,y,z) with vectors (y,z) and nested vectors (x)
NAS_MG_y
NAS_MG_z
NAS_LU_x 2D face exchange in x direction (contiguous) and y direction (vector)
NAS_LU_y
Matrix Transpose FFT 2D FFT, different vector types on send/recv side
SPECFEM3D_mt 3D matrix transpose
Molecular Dynamics LAMMPS_full unstructured exchange of different particle types (full/atomic), indexed datatypes
LAMMPS_atomic
Geophysical Science SPECFEM3D_oc unstructured exchange of acceleration data for different earth layers, indexed datatypes
SPECFEM3D_cm

DDTBench downloads can be found below. The tarballs contain the source files, Makefiles, a more detailed documentation in pdf format. We also provide a sample R script to analyze the output data, since no statistical aggregation is done by the benchmark itself.


Version Date Changes
DDTBench-1.2.1.tar.gz - (367 kb) September 24, 2015 Licence change
DDTBench-1.2.tar.gz - (367 kb) January 27, 2014 added one sided scheme, PAPI and high-resolution timer support
ddtbench-1.1.tar.gz - (44 kb) June 16, 2012 added C implementation
ddtbench-1.0.tar.gz May 19, 2012 initial release, Fortran implementation only

References

Computing
[1] T. Schneider, R. Gerstenberger, T. Hoefler:
 Application-oriented ping-pong benchmarking: how to assess the real communication overheads Journal of Computing. Vol 96, Nr. 4, pages 279-292, Springer Vienna, ISSN: 0010-485X, Apr. 2014, Special issue on top picks from EuroMPI'12.
EuroMPI'12
[2] T. Schneider, R. Gerstenberger, T. Hoefler:
 Micro-Applications for Communication Data Access Patterns and MPI Datatypes Vol 7490, In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, pages 121-131, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, Invited to a journal special issue on top picks from EuroMPI'12.