Polly-ACC: A heterogeneous compute compiler
Authors: Tobias Grosser, Torsten Hoefler
Abstract
Programming today's increasingly complex heterogeneous hardware is difficult, as it commonly requires the use of data-parallel languages, pragma annotations, specialized libraries, or DSL compilers. Adding explicit accelerator support into a larger code base is not only costly, but also introduces additional complexity that hinders long-term maintenance. We propose a new heterogeneous compiler that brings us closer to the dream of automatic accelerator mapping. Starting from a sequential compiler IR, we automatically generate a hybrid executable that - in combination with a new data management system - transparently offloads suitable code regions. Our approach is almost regression free for a wide range of applications while improving a range of compute kernels as well as two full SPEC CPU applications. We expect our work to reduce the initial cost of accelerator usage and to free developer time to investigate algorithmic changes.
Getting Polly ACC
We are currently in the progress of upstreaming Polly-ACC to polly.llvm.org. The majority of the Polly infrastructure changes (schedule tree, delinearization, ...) as well as the GPU code generation is already already upstreamed. More optimizations for the runtime library are still to come.How to use
Install LLVM/clang/Polly as normal and then run:clang -O3 -mllvm -polly -mllvm -polly-target=gpu
References
[1] T. Grosser, T. Hoefler: | ||
Polly-ACC: Transparent compilation to heterogeneous hardware
In Proceedings of the the 30th International Conference on Supercomputing (ICS'16), Jun. 2016, (acceptance rate: 24% (43/178)) |