Productive parallel programming for FPGA with HLS

Title: Productive parallel programming for FPGA with high-level synthesis
Venue: Principles and Practice of Parallel Programming 2018 [PPoPP'18]
Speakers: Johannes de Fine Licht and Torsten Hoefler
Time: 08:30 - 12:00, Sunday February 25, 2018

Slides: to be uploaded after the tutorial takes place.



Abstract: As the scale of high performance computing systems increases, so does their power consumption, making energy efficiency an increasingly important consideration in their design. While GPUs and custom processors have improved this situation significantly, FPGAs promise another major step in energy efficiency, representing a middle ground between fixed hardware architectures and custom built ASICs. Programming FPGAs has traditionally been done in hardware description languages, requiring extensive hardware knowledge and significant engineering effort. This tutorial shows how high-level synthesis (HLS) can be harnessed to efficiently exploit spatial parallelism on FPGAs, while preserving programmer productivity. Attendees will learn how to target available FPGA resources with high-level C++ constructs, and control and guide the mapping from imperative code to hardware, enabling them to develop massively parallel designs by identifying and implementing patterns suitable for spatial parallelism. We will establish the central concepts of HLS necessary to achieve an efficient hardware implementation, then show how performance modeling and more advanced programming techniques can be used to optimize it further. By enabling the design of efficient FPGA implementations in a high level language, our tutorial seeks to bridge the gap between software and hardware development, allowing programmers from a larger set of backgrounds to begin tapping into the potential of FPGAs.




We must pipeline both local loop schedules and the global dataflow for maximum throughput.



Content


The tutorial will cover modeling, designing and implementing FPGA hardware using a modern HLS tool. Sections on modeling have a theoretical aspect, but the majority of the tutorial will focus on practice, enabling attendees to start writing hardware of their own. We will take a brief look at the traditional hardware description languages (HDLs) that make up the majority of currently existing hardware implementations, then raise the abstraction to high-level synthesis languages, starting with an overview of the HLS landscape, specifically: OpenCL, as it is supported by Intel and Xilinx, respectively; C and C++, as supported by Intel, Xilinx and the third party tool LegUp. We will show code examples to illustrate the design principles behind these languages. Turning to the practical side, we will look at what hardware is generated by HLS from an imperative code, discussing the basic principles behind this transformation. Using a baseline program, we will walk through a series of hardware-oriented transformations, showing how they can be achieved and parameterized from the source code. For each transformation, we cover the following three principles:
  • - Modeling of the relationships between algorithm, hardware and performance to guide our design.
  • - Design of the algorithm architecture to efficiently exploit the hardware.
  • - Implementation of the proposed concepts in HLS, mapping C++ to hardware.
The implementation stage will be accompanied by live coding demos, exposing attendees to real examples of HLS source code. Finally, we will dive into advanced usage of the tools, including optimization of DRAM accesses, using C++ templates to generate specialized hardware, software simulation of spatial algorithms, and two different approaches to implementing loop-based code when integrating with HDL infrastructure.



Nallatech 520N (Intel Stratix 10).


Xilinx VCU1525 (Ultrascale+ VU9P).



Prerequisite knowledge


This tutorial is aimed at all audiences coming from an HPC background, interested in programming FPGAs for massive spatial parallelism. A basic understanding of hardware architectures is expected, but no practical experience with FPGAs or hardware design is required for the majority of the tutorial. Attendees with experience in HLS will benefit most from the material covered in the second half of the session.