User's Manual

ManualsBrandsAMD ManualsTypewriterAMD AMD Typewriter x86

121

122

123

124

125

126

127

128

129

130

Fast Conversion of Signed Words to Floating-Point 113

22007E/0—November 1999 AMD Athlon™ Processor x86 Code Optimization

Fast Conversion of Signed Words to Floating-Point

In many applications there is a need to quickly convert data

consisting of packed 16-bit signed integers into floating-point

numbers. The following two examples show how this can be

accomplished efficiently on AMD processors.

The first example shows how to do the conversion on a processor

that supports AMD’s 3DNow! extensions, such as the

AMD Athlon processor. It demonstrates the increased

efficiency from using the PI2FW instruction. Use of this

instruction should only be for AMD Athlon processor specific

code. See the AMD Extensions to the 3DNow!™ and MMX™

Instruction Set Manual, order #22466 for more information on

this instruction.

The second example demonstrates how to accomplish the same

task in blended code that achieves good performance on the

AMD Athlon processor as well as on the AMD-K6 family

processors that support 3DNow! technology.

Example 1 (AMD Athlon specific code using 3DNow! DSP extension):

MOVD MM0, [packed_sword] ;0 0 | b a

PUNPCKLWD MM0, MM0 ;b b | a a

PI2FW MM0, MM0 ;xb=float(b) | xa=float(a)

MOVQ [packed_float], MM0 ;store xb | xa

Example 2 (AMD-K6 Family and AMD Athlon processor blended code):

MOVD MM1, [packed_sword] ;0 0 | b a

PXOR MM0, MM0 ;0 0 | 0 0

PUNPCKLWD MM0, MM1 ;b 0 | a 0

PSRAD MM0, 16 ;sign extend: b | a

PI2FD MM0, MM0 ;xb=float(b) | xa=float(a)

MOVQ [packed_float], MM0 ;store xb | xa

Use MMX™ PXOR to Negate 3DNow!™ Data

For both the AMD Athlon and AMD-K6 processors, it is

recommended that code use the MMX PXOR instruction to

change the sign bit of 3DNow! operations instead of the 3DNow!

PFMUL instruction. On the AMD Athlon processor, using

PXOR allows for more parallelism, as it can execute in either

the FADD or FMUL pipes. PXOR has an execution latency of

two, but because it is a MMX instruction, there is an initial one