Specifications

ManualsBrandsQuantum Data ManualsProjector822S

Chapter 2. Architecture and technical overview 31

Draft Document for Review May 12, 2014 12:46 pm 5102ch02.fm

Table 2-3 POWER8 cache hierarchy

For more information on the POWER8 memory subsystem, see 2.2, “Memory subsystem” on

page 34.

2.1.7 Hardware transactional memory

Transactional memory is an alternative to lock-based synchronization. It attempts to simplify

parallel programming by grouping read and write operations and running them like a single

operation. Transactional memory is like database transactions where all shared memory

accesses and their effects are either committed all together or discarded as a group. All

threads can enter the critical region simultaneously. If there are conflicts in accessing the

shared memory data, threads try accessing the shared memory data again or are stopped

without updating the shared memory data. Therefore, transactional memory is also called a

lock-free synchronization. Transactional memory can be a competitive alternative to

lock-based synchronization.

Transactional memory provides a programming model that makes parallel programming

easier. A programmer delimits regions of code that access shared data and the hardware

executes these regions atomically and in isolation, buffering the results of individual

instructions, and retrying execution if isolation is violated. Generally, transactional memory

allows programs to use a programming style that is close to coarse-grained locking to achieve

performance that is close to fine-grained locking.

Most implementations of transactional memory are based on software. The POWER8

processor-based systems provide a hardware-based implementation of transactional

memory, that is more efficient than the software implementations and requires no interaction

with the processor core, therefore allowing the system to operate in maximum performance.

2.1.8 Coherent Accelerator Processor Interface

The Coherent Accelerator Interface Architecture (CAIA) defines a coherent accelerator

interface structure for attaching special to the POWER systems.

Cache POWER7 POWER7+ POWER8

L1 instruction cache:

Capacity/associativity

32 KB, 4-way 32 KB, 4-way 32 KB, 8-way

L1 data cache:

Capacity/associativity

bandwidth

32 KB, 8-way

2 16 B reads or

1 16 B writes per cycle

32 KB, 8-way

2 16 B reads or

1 16 B writes per cycle

64 KB, 8-way

4 16 B reads or

1 16 B writes per cycle

L2 cache:

Capacity/associativity

bandwidth

256 KB, 8-way

Private

32 B reads and 16 B writes

per cycle

256 KB, 8-way

Private

32 B reads and 16 B writes

per cycle

512 KB, 8-way

Private

64 B reads and 16 B writes

per cycle

L3 cache:

Capacity/associativity

bandwidth

On-Chip

4 MB/core, 8-way

16 B reads and 16 B writes

per cycle

On-Chip

10 MB/core, 8-way

16 B reads and 16 B writes

per cycle

On-Chip

8 MB/core, 8-way

32 B reads and 32 B writes

per cycle

L4 cache:

Capacity/associativity

bandwidth

N/A N/A On-Chip

16 MB/buffer chip, 16-way

Up to 8 buffer chips per

socket