Polly-ACC: A heterogeneous compute compiler

Authors: Tobias Grosser, Torsten Hoefler


Abstract

Programming today's increasingly complex heterogeneous hardware is difficult, as it commonly requires the use of data-parallel languages, pragma annotations, specialized libraries, or DSL compilers. Adding explicit accelerator support into a larger code base is not only costly, but also introduces additional complexity that hinders long-term maintenance. We propose a new heterogeneous compiler that brings us closer to the dream of automatic accelerator mapping. Starting from a sequential compiler IR, we automatically generate a hybrid executable that - in combination with a new data management system - transparently offloads suitable code regions. Our approach is almost regression free for a wide range of applications while improving a range of compute kernels as well as two full SPEC CPU applications. We expect our work to reduce the initial cost of accelerator usage and to free developer time to investigate algorithmic changes.


Getting Polly ACC

We are currently in the progress of upstreaming Polly-ACC to polly.llvm.org. The majority of the Polly infrastructure changes (schedule tree, delinearization, ...) as well as the GPU code generation is already already upstreamed. More optimizations for the runtime library are still to come.

How to use

Install LLVM/clang/Polly as normal and then run:
clang -O3 -mllvm -polly -mllvm -polly-target=gpu


References

ICS'16
[1] T. Grosser, T. Hoefler:
 Polly-ACC: Transparent compilation to heterogeneous hardware In Proceedings of the the 30th International Conference on Supercomputing (ICS'16), Jun. 2016, (acceptance rate: 24% (43/178))