The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Publications of SPCL
|F. Schuiki, F. Zaruba, T. Hoefler, L. Benini:
|Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores
(IEEE Transactions on Computers (TOC). Vol , Nr. , IEEE, Apr. 2020, )
AbstractSingle-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cycle not spent on computation, limiting ALU/FPU utilization to 33% on reductions. We propose 'Stream Semantic Registers' to boost utilization and increase energy efficiency. SSR is a lightweight, non-invasive RISC-V ISA extension which implicitly encodes memory accesses as register reads/writes, eliminating a large number of loads/stores. We implement the proposed extension in the RTL of an existing multi-core cluster and synthesize the design for a modern 22nm technology. Our extension provides a significant, 2x to 5x, architectural speedup across different kernels at a small 11% increase in core area. Sequential code runs 3x faster on a single core, and 3x fewer cores are needed in a cluster to achieve the same performance. The utilization increase to almost 100% in leads to a 2x energy efficiency improvement in a multi-core cluster. The extension reduces instruction fetches by up to 3.5x and instruction cache power consumption by up to 5.6x. Compilers can automatically map loop nests to SSRs, making the changes transparent to the programmer.