Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

11-28 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
Use stack and data alignment techniques to keep data properly aligned for
efficient memory use.
Use the non-temporal store instructions offered with the SSE and SSE2
extensions.
Employ the optimization and scheduling techniques described in the Intel
Pentium 4 Optimization Reference Manual (see Section 1.4, “Related Literature,
for the order number for this manual).
11.6.2 Checking for SSE/SSE2 Support
Before an application attempts to use the SSE and/or SSE2 extensions, it should
check that they are present on the processor and that the operating system supports
them. The application can make this check by following these steps:
1. Check that the processor supports the CPUID instruction by attempting to
execute the CPUID instruction. If the processor does not support the CPUID
instruction, it will generate an invalid-opcode exception (#UD).
2. Check that the processor supports the SSE and/or SSE2 extensions (true if
CPUID.01H:EDX.SSE[bit 25] = 1 and/or CPUID.01H:EDX.SSE2[bit 26] = 1).
3. Check that the processor supports the FXSAVE and FXRSTOR instructions (true if
CPUID.01H:EDX.FXSR[bit 24] = 1).
4. Check that the operating system supports the FXSAVE and FXRSTOR instruc-
tions. (execute a MOV instruction, true if CR4. OSFXSR[bit 9] = 1).
5. Check that the operating system supports SIMD floating-point exception
handling. (execute a MOV instruction, true if CR4.OSXMMEXCPT[bit 10] = 1).
NOTE
CR4.OSFXSR[bit 9] and CR4.OSXMMEXCPT[bit 10] must be set by
the operating system. The processor has no other way of detecting
operating-system support for the FXSAVE and FXRSTOR instructions
or for handling SIMD floating-point exceptions.
6. Check that emulation of the x87 FPU is disabled (execute a MOV instruction, true
if CR0.EM[bit 2] = 0).
If the processor attempts to execute an unsupported SSE or SSE2 instruction, the
processor will generate an invalid-opcode exception (#UD).
11.6.3 Checking for the DAZ Flag in the MXCSR Register
The denormals-are-zero flag in the MXCSR register is available in most of the
Pentium 4 processors and in the Intel Xeon processor, with the exception of some