User Guide
© 2021 IBM Corporation
3
IBM
®
Power
®
Power10 Quick Start Guide - P10 MMA Performance Guide
P10 Compute & MMA Architecture
Ø2x Bandwidth matched SIMD*
Ø 8 independent Fixed & Float SIMD engines per Core
Ø4 – 32x Matrix Math Acceleration*
Ø 4 512 bit engine per core = 2048b results / cycles
Ø Matrix math outer products of Single, Double & Reduced precision.
ØMMA Architecture support introduced in POWER ISA v3.1
ØSupports SP, DP, BF16, HP, Int-16, Int-8 & Int-4 precision levels.
P10 MMAApplications & Workload Integration
Ø ML & HPC applications with dense linear algebra computations, matrix
multiplications, convolutions, FFT can be accelerate with MMA
Ø GCC version >= 10 & LLVM version >=12 supports MMA through built-ins.
Ø OpenBLAS, IBM ESSL & Eigen Libraries already optimized with MMA instructions
for P10.
Ø Easy integration of MMA for enterprise applications, ML frameworks and Open
Community packages via above BLAS libraries.
PowerPC Matrix-Multiply Assist Built-in Functions
https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Matrix-Multiply-Assist-Built-in-Functions.html
Matrix-Multiply Assist Best Practices Guide
https://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/redp5612.html?Open