user manual

Stream of Packed Unsigned Bytes 125
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
The following code fragment uses the 3DNow! PAVGUSB
instruction to perform averaging between the source
macroblock and destination macroblock:
Example 2 (Preferred):
MOV EAX, DWORD PTR Src_MB
MOV EDI, DWORD PTR Dst_MB
MOV EDX, DWORD PTR SrcStride
MOV EBX, DWORD PTR DstStride
MOV ECX, 16
L1:
MOVQ MM0, [EAX] ;MM0=QWORD1
MOVQ MM1, [EAX+8] ;MM1=QWORD2
PAVGUSB MM0, [EDI] ;(QWORD1 + QWORD3)/2 with
; adjustment
PAVGUSB MM1, [EDI+8] ;(QWORD2 + QWORD4)/2 with
; adjustment
ADD EAX, EDX
MOVQ [EDI], MM0
MOVQ [EDI+8], MM1
ADD EDI, EBX
LOOP L1
Stream of Packed Unsigned Bytes
The following code is an example of how to process a stream of
packed unsigned bytes (like RGBA information) with faster
3DNow! instructions.
Example:
outside loop:
PXOR MM0, MM0
inside loop:
MOVD MM1, [VAR] ; 0 | v[3],v[2],v[1],v[0]
PUNPCKLBW MM1, MM0 ;0,v[3],0,v[2] | 0,v[1],0,v[0]
MOVQ MM2, MM1 ;0,v[3],0,v[2] | 0,v[1],0,v[0]
PUNPCKLWD MM1, MM0 ; 0,0,0,v[1] | 0,0,0,v[0]
PUNPCKHWD MM2, MM0 ; 0,0,0,v[3] | 0,0,0,v[2]
PI2FD MM1, MM1 ; float(v[1]) | float(v[0])
PI2FD MM2, MM2 ; float(v[3]) | float(v[2])