user manual

Use MMX Instructions for Block Copies and Block Fills 115
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
Use MMX Instructions for Block Copies and Block Fills
For moving or filling small blocks of data (e.g., less than 512
bytes) between cacheable memory areas, the REP MOVS and
REP STOS families of instructions deliver good performance
and are straightforward to use. For moving and filling larger
blocks of data, or to move/fill blocks of data where the
destination is in non-cacheable space, it is recommended to
make use of MMX instructions and MMX extensions. The
following examples all use quadword-aligned blocks of data. In
cases where memory blocks are not quadword aligned,
additional code is required to handle end cases as needed.
AMD-K6
®
and
AMD Athlon
Processor Blended
Code
The following example code, written for the inline assembler of
Microsoft Visual C, is suitable for moving/filling a large quad-
word aligned block of data in the following situations:
Blended code, i.e., code that needs to perform well on both
AMD Athlon and AMD-K6 family processors
AMD Athlon processor specific code where the destination
is in cacheable memory and immediate data re-use of the
data at the destination is expected
AMD-K6 family specific code where the destination is in
non-cacheable memory
Example 1:
/* block copy (source and destination QWORD aligned) */
__asm {
mov eax, [src_ptr]
mov edx, [dst_ptr]
mov ecx, [blk_size]
shr ecx, 6
align 16