Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

281

282

283

284

285

286

287

288

289

290

Vol. 1 10-11

PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)

The MOVAPS (move aligned packed single-precision floating-point values) instruction

transfers a double quadword operand containing four packed single-precision

floating-point values from memory to an XMM register and vice versa, or between

XMM registers. The memory address must be aligned to a 16-byte boundary; other-

wise, a general-protection exception (#GP) is generated.

The MOVUPS (move unaligned packed single-precision, floating-point) instruction

performs the same operations as the MOVAPS instruction, except that 16-byte align-

ment of a memory address is not required.

The MOVSS (move scalar single-precision floating-point) instruction transfers a 32-

bit single-precision floating-point operand from memory to the low doubleword of an

XMM register and vice versa, or between XMM registers.

The MOVLPS (move low packed single-precision floating-point) instruction moves

two packed single-precision floating-point values from memory to the low quadword

of an XMM register and vice versa. The high quadword of the register is left

unchanged.

The MOVHPS (move high packed single-precision floating-point) instruction moves

two packed single-precision floating-point values from memory to the high quadword

of an XMM register and vice versa. The low quadword of the register is left

unchanged.

The MOVLHPS (move packed single-precision floating-point low to high) instruction

moves two packed single-precision floating-point values from the low quadword of

the source XMM register into the high quadword of the destination XMM register. The

low quadword of the destination register is left unchanged.

The MOVHLPS (move packed single-precision floating-point high to low) instruction

moves two packed single-precision floating-point values from the high quadword of

the source XMM register into the low quadword of the destination XMM register. The

high quadword of the destination register is left unchanged.

The MOVMSKPS (move packed single-precision floating-point mask) instruction

transfers the most significant bit of each of the four packed single-precision floating-

point numbers in an XMM register to a general-purpose register. This 4-bit value can

then be used as a condition to perform branching.

10.4.1.2 SSE Arithmetic Instructions

SSE arithmetic instructions perform addition, subtraction, multiply, divide, recip-

rocal, square root, reciprocal of square root, and maximum/minimum operations on

packed and scalar single-precision floating-point values.

The ADDPS (add packed single-precision floating-point values) and SUBPS (subtract

packed single-precision floating-point values) instructions add and subtract, respec-

tively, two packed single-precision floating-point operands.

The ADDSS (add scalar single-precision floating-point values) and SUBSS (subtract

scalar single-precision floating-point values) instructions add and subtract, respec-