SPCL - LibLSB

Performance

LibSciBench

Motivation

LibSciBench is a framework developed to facilitate the adoption of statistically sound measurements of parallel computations. It automates many of the guidelines and processes described in [1].

The main component of the framework is LibLSB, a performance measurement library that can be linked to any application to either measure execution times or it can be used as a building block for a new benchmark suite. The library seamlessly integrates with MPI and OpenMP applications and can easily be extended to new models of parallelism.

The datasets produced by LibLSB can be read directly with established statistical tools such as GNU R. LibSciBench is coupled with an R script for the postprocessing of the gathered data. It allows to check for normality, compute CIs, perform ANOVA tests, and quantile regression. Furthermore, LibSciBench’s R script supports the generation of Q-Q plots, box plots and violin plots.

Timers

LibSciBench offers high-resolution timers for many architectures (currently x86, x86-64, PowerPC, and Sparc). The library automatically reports the timer resolution and overhead on the target architecture. LibSciBench has support for arbitrary PAPI counters and, as showed in the example below, allows to create group of measurements identified by an arbitrary number of parameters.

#include <mpi.h>
#include <liblsb.h>
#include <stdlib.h>
#include <time.h>

#define N 1024
#define RUNS 10

int main(int argc, char *argv[]){
    int i, j, rank;
    int buffer[N];
    
    srand(time(NULL));
    for (int i=0; i<N; i++) buffer[i]=rand();

    MPI_Init(&argc, &argv);
    LSB_Init("test_bcast", 0);
    
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    /* Output the info (i.e., rank, runs) in the results file */
    LSB_Set_Rparam_int("rank", rank);
    LSB_Set_Rparam_int("runs", RUNS);

    for (i=1; i<=N; i*=2){

        for (j=0; j<RUNS; j++){
            /* Reset the counters */
            LSB_Res();

            /* Perform the operation */
            MPI_Bcast(buffer, i, MPI_INT, 0, MPI_COMM_WORLD);

            /* Register the j-th measurement with the size i */        
            LSB_Rec(i);
        }
    }

    LSB_Finalize();
    MPI_Finalize();
    return 0;
}

Synchronization

Most of today’s parallel systems are asynchronous and do not have a common clock source. Furthermore, clock drift between processes could impact measurements and network latency variations make time synchronization tricky [2].

Many evaluations use an (MPI or OpenMP) barrier to synchronize processes for time measurement. This may be unreliable because neither MPI nor OpenMP provides timing guarantees for their barrier calls. While a barrier commonly synchronizes processes enough, we recommend checking the implementation. For accurate time synchronization, LibLSB offers a simple delay window scheme [2].

In this scheme, a master synchronizes the clocks of all processes and broadcasts a common start time for the operation. The start time is sufficiently far in the future that the broadcast will arrive before the time itself. Each process then waits until this time and the operations starts synchronously.

Post analysis

Normality Check In order to check if a dataset is normally distributed the following methods are offered: Shapiro-Wilk test, QQ-plot and histogram/density plot. Options for log- or k-normalizing the data are provided.

Comparing Statistical Data ANOVA and Kruskal-Waillis one-way ANOVA tests are provided in order to check the statistically significance of the mean or the median, respectively. Quantile Regression plots can be produced in order to observe the effect of varying factor on arbitrary quantiles.

Graphing Results Box and violin plots offer a rich set of statistical information for arbitrary distributions. They can be produced starting from the provided data or its summary.

Filtering & Summarizing The input dataset can be filtered using a R-style boolean expression on the parameter indicated to LibLSB during the measurement phase. A summary of the data can be computed specifying the grouping columns (i.e., parameters specified to LibLSB) and the measurement column (i.e., the measured time or a PAPI counter).

Download

Latest version: LibLSB GitHub repository

Version	Date	Changes
liblsb-0.2.2.tar.gz - (698 KB)	Aug 21, 2016	Fix window-based synchronization
liblsb-0.2.1.tar.gz - (389 KB)	May 20, 2016	Minor fixes
liblsb-0.2.tar.gz - (432.2 KB)	November 13, 2015	First release

References

SC15	[1] T. Hoefler, R. Belli:
		Scientific Benchmarking of Parallel Computing Systems presented in Austin, TX, USA, pages 73:1--73:12, ACM, ISBN: 978-1-4503-3723-6, Nov. 2015, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) (acceptance rate: 22%, 79/358)

IJPEDS	[2] T. Hoefler, T. Schneider, A. Lumsdaine:
		Accurately Measuring Overhead, Communication Time and Progression of Blocking and Nonblocking Collective Operations at Massive Scale International Journal of Parallel, Emergent and Distributed Systems. Vol 25, Nr. 4, pages 241-258, Taylor & Francis Group, ISSN: 1744-5779, Jul. 2010,