Netgauge LogGPS (LogP, LogGP) Measurement Description:
The loggp pattern in Netgauge allows the precise measurement of LogP [2],
LogGP [3], and LogGPS [4] parameters of MPI implementations. Only MPI is supported
right now, but the required modifications to support different modules (e.g.,
TCP, UDP) should be minimal.
The loggp pattern employs the techniques described in [1].
General
The benchmarks use Netgauge's high-performance timers for different
architectures. Users should make sure that the configure script detected
the timer correctly and that it works reliably (no frequency scaling etc.).
The benchmark results can be hard to interpret due to the used
measurement method. We decided against returning single parameters
because that could easily lead to wrong results or even negative
parameters. Instead, the benchmark returns results for each message size
(as it proceeds) and the quality of the fitting.
A possible invocation would be:
$ mpirun -n 2 ./netgauge -s 1-8192 -x loggp
# Info: (0): Netgauge v2.1 MPI enabled (P=2) (./netgauge -s 1-8192 -x loggp )
# initializing x86-64 timer (takes some seconds)
# Info: (0): Warming module mpi up ... this may take a while
Testing 1 bytes 100 times:
L=0.6128 s=1 o_s=0.266 o_r=0.547 g=nan G=nan (nan GiB/s) lsqu(g,G)=nan
Testing 1025 bytes 100 times:
L=0.6128 s=1025 o_s=0.458 o_r=1.163 g=0.448 G=0.000583 (13.405 GiB/s) lsqu(g,G)=inf
Testing 2049 bytes 100 times:
L=0.6128 s=2049 o_s=0.528 o_r=1.542 g=0.483 G=0.000478 (16.349 GiB/s) lsqu(g,G)=0.0877
Testing 3073 bytes 100 times:
L=0.6128 s=3073 o_s=0.622 o_r=1.806 g=0.534 G=0.000404 (19.338 GiB/s) lsqu(g,G)=0.1156
Testing 4097 bytes 100 times:
L=0.6128 s=4097 o_s=0.714 o_r=1.867 g=0.612 G=0.000328 (23.820 GiB/s) lsqu(g,G)=0.1706
Testing 5121 bytes 100 times:
L=0.6128 s=5121 o_s=0.781 o_r=2.050 g=0.689 G=0.000272 (28.745 GiB/s) lsqu(g,G)=0.2028
Testing 6145 bytes 100 times:
L=0.6128 s=6145 o_s=0.869 o_r=2.190 g=0.749 G=0.000236 (33.063 GiB/s) lsqu(g,G)=0.2127
Testing 7169 bytes 100 times:
L=0.6128 s=7169 o_s=0.967 o_r=2.326 g=0.801 G=0.000211 (37.018 GiB/s) lsqu(g,G)=0.2169
The actual parameters are reported for each data-size. The latency is half of
roundtrip-time of a 1-byte message (and does not depend on the data-size).
The send overhead is computed as described in [1] and the receive overhead
is simply the time it takes to finish MPI_Recv() (and thus not very accurate).
The parameter g and G are computed by the curve fitting. The curve fitting needs at
least two points, thus, they can not be computed for the first measurement (nan).
However, the more measurement points are considered, the more accurate are
the results. The last parameter "lsqu(g,G)" is the least squares deviation
of the fit for g,G. The lower this number is, the better the fit and the results. Please
refer to [1] for details.
Parameter changes are detected by sudden changes in the least squares deviation. Please
refter to [1] for details.
The benchmark also creates a file "ng.out" which can be plotted for visual analysis
of the results. One possible plot in gnuplot would be: plot "ng.out" using 1:($4-$3)/($2-1) . This
plots the points that the g,G, line are fitted to. Or plot "ng.out" using 1:7 plots the
send overhead for varying data sizes.
Getting the LogGPS parameters
Extracting the actual parameters from the output is difficult and requires
some level of understanding of the used technique. Please refer to [1] in
order to understand the measurement method. A rough guide for each of the
parameters is given below:
- L: simply use the displayed L (round-trip/2). Sometimes it is advisable to substract o_s and/or o_r, however, this can lead to negative latencies (as o_s can happen after the message has been sent).
- o_s: is defined to be constant (per packet) in the LogP model, however, it is often not constant in practice (per message which might consist of multiple packets). You should use o_s of the desired packet size.
- o_r: is relatively imprecise and should be used carefully. Please contact the author if you know a precises measurement method for o_r.
- g: is approximately the point where the fitted curve crosses the y axis (s=0). However, some systems don't have ideal transmission curves. It is advisable to use a g with sufficiently many points to fit and a small lsqu(g,G).
- G: is the slope of the fitted g,G curve. It is advisable to use a g with sufficiently many points to fit and a small lsqu(g,G).
- S: is where the library switches from eager to rendezvous. While the library is not obliged to do this at all, it is commonly done on MPI libraries. The benchmark monitors the deviation and tries to detect protocol changes. However, it is safest to investigate the plot manually.
References
PMEO'07 |
[1] T. Hoefler, A. Lichei, W. Rehm: |
| Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks
TU Chemnitz. In Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, PMEO'07 Workshop, presented in Long Beach, CA, USA, IEEE Computer Society, ISBN: 1-4244-0909-8, Mar. 2007,
|
[2]
David Culler, Richard Karp, David Patterson, Abhijit Sahay,
Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian,
Thorsten von Eicken
|
|
LogP: towards a realistic model of parallel computation
ACM SIGPLAN Notices, Volume 28 , Issue 7 (July 1993), Pages: 1 - 12
|
[3]
Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, Chris Scheiman
|
|
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Technical Report: TRCS95-09, University of California at Santa Barbara Santa Barbara, CA, USA
|
[4]
Fumihiko Ino, Noriyuki Fujimoto, Kenichi Hagihara
|
|
LogGPS: a parallel computational model for synchronization analysis
Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, Pages: 133 - 142 Year of Publication: 2001 ISBN:1-58113-346-4
|