User's Manual

114 Use MMX PCMP Instead of 3DNow! PFCMP
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
cycle bypassing penalty, and another one cycle penalty if the
result goes to a 3DNow! operation. The PFMUL execution
latency is four, therefore, in the worst case, the PXOR and
PMUL instructions are the same in terms of latency. On the
AMD-K6 processor, there is only a one cycle latency for PXOR,
versus a two cycle latency for the 3DNow! PFMUL instruction.
Use the following code to negate 3DNow! data:
msgn DQ 8000000080000000h
PXOR MM0, [msgn] ;toggle sign bit
Use MMX PCMP Instead of 3DNow! PFCMP
Use the MMX PCMP instruction instead of the 3DNow! PFCMP
instruction. On the AMD Athlon processor, the PCMP has a
latency of two cycles while the PFCMP has a latency of four
cycles. In addition to the shorter latency, PCMP can be issued to
either the FADD or the FMUL pipe, while PFCMP is restricted
to the FADD pipe.
Note: The PFCMP instruction has a GE (greater or equal)
version (PFCMPGE) that is missing from PCMP.
Both Numbers
Positive
If both arguments are positive, PCMP always works.
One Negative, One
Positive
If one number is negative and the other is positive, PCMP still
works, except when one number is a positive zero and the other
is a negative zero.
Both Numbers
Negative
Be careful when performing integer comparison using PCMPGT
on two negative 3DNow! numbers. The result is the inverse of
the PFCMPGT floating-point comparison. For example:
–2 = 84000000
–4 = 84800000
PCMPGT gives 84800000 > 84000000, but 4 < 2. To address
this issue, simply reverse the comparison by swapping the
source operands.