Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
11-24 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
and continuing program execution. The masked result may be a rounded normalized
value, signed infinity, a denormal finite number, zero, a QNaN floating-point indefi-
nite, or a QNaN depending on the exception condition detected. In most cases, the
corresponding exception flag bit in MXCSR is also set. The one situation where an
exception flag is not set is when an underflow condition is detected and it is not
accompanied by an inexact result.
When operating on packed floating-point operands, the processor returns a masked
result for each of the sub-operand computations and sets a separate set of internal
exception flags for each computation. It then performs a logical-OR on the internal
exception flag settings and sets the exception flags in the MXCSR register according
to the results of OR operations.
For example, Figure 11-9 shows the results of an MULPS instruction. In the example,
all SIMD floating-point exceptions are masked. Assume that a denormal exception
condition is detected prior to the multiplication of sub-operands X0 and Y0, no excep-
tion condition is detected for the multiplication of X1 and Y1, a numeric overflow
exception condition is detected for the multiplication of X2 and Y2, and another
denormal exception is detected prior to the multiplication of sub-operands X3 and
Y3. Because denormal exceptions are masked, the processor uses the denormal
source values in the multiplications of (X0 and Y0) and of (X3 and Y3) passing the
results of the multiplications through to the destination operand. With the denormal
operand, the result of the X0 and Y0 computation is a normalized finite value, with no
exceptions detected. However, the X3 and Y3 computation produces a tiny and
inexact result. This causes the corresponding internal numeric underflow and
inexact-result exception flags to be set.
For the multiplication of X2 and Y2, the processor stores the floating-point ∞ in the
destination operand, and sets the corresponding internal sub-operand numeric over-
flow flag. The result of the X1 and Y1 multiplication is passed through to the destina-
tion operand, with no internal sub-operand exception flags being set. Following the
computations, the individual sub-operand exceptions flags for denormal operand,
Figure 11-9. Example Masked Response for Packed Operations
X3
X2 X1 X0 (Denormal)
Y3 (Denormal)
Y2 Y1 Y0
Tiny, Inexact, Finite Normalized Finite
MULPS
MULPS
MULPS
MULPS
∞
Normalized Finite