Intel 64 and IA-32 Architectures Software Developers Manual Volume 2A, Instruction Set Reference, A-M

3-14 Vol. 2

INSTRUCTION SET REFERENCE, A-M

The suffixes ps and ss are used to denote “packed single” and “scalar single” preci-

sion operations. The packed floats are represented in right-to-left order, with the

lowest word (right-most) being used for scalar operations: [z, y, x, w]. To explain

how memory storage reflects this, consider the following example.

The operation:

float a[4] ← { 1.0, 2.0, 3.0, 4.0 };

__m128 t ← _mm_load_ps(a);

Produces the same result as follows:

__m128 t ← _mm_set_ps(4.0, 3.0, 2.0, 1.0);

In other words:

t ← [ 4.0, 3.0, 2.0, 1.0 ]

Where the “scalar” element is 1.0.

Some intrinsics are “composites” because they require more than one instruction to

implement them. You should be familiar with the hardware features provided by the

SSE, SSE2, SSE3, and MMX technology when writing programs with the intrinsics.

Keep the following important issues in mind:

• Certain intrinsics, such as _mm_loadr_ps and _mm_cmpgt_ss, are not directly

supported by the instruction set. While these intrinsics are convenient

programming aids, be mindful of their implementation cost.

• Data loaded or stored as __m128 objects must generally be 16-byte-aligned.

• Some intrinsics require that their argument be immediates, that is, constant

integers (literals), due to the nature of the instruction.

• The result of arithmetic operations acting on two NaN (Not a Number) arguments

is undefined. Therefore, floating-point operations using NaN arguments may not

match the expected behavior of the corresponding assembly instructions.

For a more detailed description of each intrinsic and additional information related to

its usage, refer to Intel C/C++ compiler documentation. See:

— http://www.intel.com/support/performancetools/

— Appendix C, “Intel® C/C++ Compiler Intrinsics and Functional Equivalents,”

in the Intel® 64 and IA-32 Architectures Software Developer’s Manual,

Volume 2B, for more information on using intrinsics.

3.1.1.9 Flags Affected Section

The “Flags Affected” section lists the flags in the EFLAGS register that are affected by

the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1.

The arithmetic and logical instructions usually assign values to the status flags in a

uniform manner (see Appendix A, “Eflags Cross-Reference,” in the Intel® 64 and

IA-32 Architectures Software Developer’s Manual, Volume 1). Non-conventional