Specifications

5102ch02.fm Draft Document for Review May 12, 2014 12:46 pm
32 IBM Power System S822 Technical Overview and Introduction
The Coherent Accelerator Processor Interface (CAPI) provides the ability to attach
accelerators that have coherent shared memory access with the processors in the server and
share full virtual address translation with these processors, using a standard PCIe Gen3 bus.
Applications can have customized functions in Field Programmable Gate Arrays (FPGA) and
be able to enqueue work requests directly in shared memory queues to the FPGA, and using
the same effective addresses (pointers) it would use for any of its threads running on a host
processor. From the practical perspective, CAPI allows a specialized hardware accelerator to
be seen as an additional processor in the system, with access to the main system memory,
and coherent communication with other processors in the system.
The benefits of using CAPI include the ability to access shared memory blocks directly from
the accelerator, perform memory transfers directly between the accelerator and processor
cache, and reduction on the code path length between the adapter and the processors, since
the adapter is not operating as a traditional I/O device, and there is no device driver layer to
perform processing. It also presents a simpler programming model.
Figure 2-8 shows a high level view on how an accelerator communicates with the POWER8
processor through CAPI. The POWER8 processor provides a Coherent Attached Processor
Proxy (CAPP), that is responsible for extending the coherence in the processor
communications to an external device. The coherency protocol is tunneled over standard
PCIe Gen3, effectively making the accelerator part of the coherency domain.
The accelerator adapter implements the Power Service Layer (PSL), that provides address
translation and system memory cache for the accelerator functions. The custom processors
on the board, consisting of an FPGA or an Application Specific Integrated Circuit (ASIC) use
this layer to access shared memory regions, cache areas as if they were a processor in the
system. This ability greatly enhances the performance of the data access for the device and
simplifies the programming effort to use the device. Instead of treating the hardware
accelerator as an I/O device, it is treated as a processor. That eliminates the requirement of a
device driver to perform communication, as well as the need for Direct Memory Access that
requires system calls to the operating system kernel. By removing these layers, the data
transfer operation requires much less clock cycles in the processor, greatly improving the I/O
performance.
Figure 2-8 CAPI accelerator attached to the POWER8 processor
The implementation of CAPI on the POWER8 processor allows hardware companies to
develop solutions for specific application demands and leverage the performance of the
POWER8 processor for general applications as well as the custom acceleration of specific
Custom
Hardware
Application
CAPP
Coherence Bus
PSL
FPGA or ASIC
POWER8
PCIe Gen3
Transport for encapsulated messages