Intel 64 and IA-32 Architectures Software Developers Manual Volume 2A, Instruction Set Reference, A-M
3-14 Vol. 2
INSTRUCTION SET REFERENCE, A-M
The suffixes ps and ss are used to denote “packed single” and “scalar single” preci-
sion operations. The packed floats are represented in right-to-left order, with the
lowest word (right-most) being used for scalar operations: [z, y, x, w]. To explain
how memory storage reflects this, consider the following example.
The operation:
float a[4] ← { 1.0, 2.0, 3.0, 4.0 };
__m128 t ← _mm_load_ps(a);
Produces the same result as follows:
__m128 t ← _mm_set_ps(4.0, 3.0, 2.0, 1.0);
In other words:
t ← [ 4.0, 3.0, 2.0, 1.0 ]
Where the “scalar” element is 1.0.
Some intrinsics are “composites” because they require more than one instruction to
implement them. You should be familiar with the hardware features provided by the
SSE, SSE2, SSE3, and MMX technology when writing programs with the intrinsics.
Keep the following important issues in mind:
• Certain intrinsics, such as _mm_loadr_ps and _mm_cmpgt_ss, are not directly
supported by the instruction set. While these intrinsics are convenient
programming aids, be mindful of their implementation cost.
• Data loaded or stored as __m128 objects must generally be 16-byte-aligned.
• Some intrinsics require that their argument be immediates, that is, constant
integers (literals), due to the nature of the instruction.
• The result of arithmetic operations acting on two NaN (Not a Number) arguments
is undefined. Therefore, floating-point operations using NaN arguments may not
match the expected behavior of the corresponding assembly instructions.
For a more detailed description of each intrinsic and additional information related to
its usage, refer to Intel C/C++ compiler documentation. See:
— http://www.intel.com/support/performancetools/
— Appendix C, “Intel® C/C++ Compiler Intrinsics and Functional Equivalents,”
in the Intel® 64 and IA-32 Architectures Software Developer’s Manual,
Volume 2B, for more information on using intrinsics.
3.1.1.9 Flags Affected Section
The “Flags Affected” section lists the flags in the EFLAGS register that are affected by
the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1.
The arithmetic and logical instructions usually assign values to the status flags in a
uniform manner (see Appendix A, “Eflags Cross-Reference,” in the Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume 1). Non-conventional