User guide
Programmers Model 
ARM DDI 0337I Copyright © 2005-2008, 2010 ARM Limited. All rights reserved. 3-9
ID072410 Non-Confidential
•  Any load or store that generates an address dependent on the result of a proceeding data 
processing operation will stall the pipeline for an additional cycle whilst the register bank 
is updated. There is no forwarding path for this scenario.
•
LDR Rx,[PC,#imm]
 might add a cycle because of contention with the fetch unit.
•
TBB
 and 
TBH
 are also blocking operations. These are at least two cycles for the load, one 
cycle for the add, and three cycles for the pipeline reload. This means at least six cycles, 
or more if stalled on the load or the fetch.
•
LDR [any]
 are pipelined when possible. This means that if the next instruction is an 
LDR
 or 
STR
, and the destination of the first 
LDR
 is not used to compute the address for the next 
instruction, then one cycle is removed from the cost of the next instruction. So, an 
LDR
might be followed by an 
STR
, so that the 
STR
 writes out what the 
LDR
 loaded. More multiple
LDR
s can be pipelined together. Some optimized examples are:
—
LDR R0,[R1]; LDR R1,[R2]
 - normally three cycles total
—
LDR R0,[R1,R2]; STR R0,[R3,#20]
 - normally three cycles total
—
LDR R0,[R1,R2]; STR R1,[R3,R2]
 - normally three cycles total
—
LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4]
 - normally four cycles total.
• Other instructions cannot be pipelined after 
STR
 with register offset. 
STR
 can only be 
pipelined when it follows an 
LDR
, but nothing can be pipelined after the store. Even a 
stalled 
STR
 normally only takes two cycles, because of the write buffer.
•
LDREX
 and 
STREX
 can be pipelined exactly as 
LDR
. Because 
STREX
 is treated more like an 
LDR
, 
it can be pipelined as explained for 
LDR
. Equally 
LDREX
 is treated exactly as an 
LDR
 and so 
can be pipelined.
•
LDRD
 and 
STRD
 cannot be pipelined with preceding or following instructions. However, the 
two words are pipelined together. So, this operation requires three cycles when not stalled.
•
LDM
 and 
STM
 cannot be pipelined with preceding or following instructions. However, all 
elements after the first are pipelined together. So, a three element 
LDM
 takes 2+1+1 or 5 
cycles when not stalled. Similarly, an eight element store takes nine cycles when not 
stalled. When interrupted, 
LDM
 and 
STM
 instructions continue from where they left off when 
returned to. The continue operation adds one or two cycles to the first element when 
started.
• Unaligned word or halfword loads or stores add penalty cycles. A byte aligned halfword 
load or store adds one extra cycle to perform the operation as two bytes. A halfword 
aligned word load or store adds one extra cycle to perform the operation as two halfwords. 
A byte-aligned word load or store adds two extra cycles to perform the operation as a byte, 
a halfword, and a byte. These numbers increase if the memory stalls. A 
STR
 or 
STRH
 cannot 
delay the processor because of the write buffer.
3.3.3 Binary compatibility with other Cortex processors
The processor implements a binary compatible subset of the instruction set and features 
provided by other Cortex-M profile processors. You can move software, including system level 
software, from the Cortex-M3 processor to other Cortex-M profile processors.
To ensure a smooth transition, ARM recommends that code designed to operate on other 
Cortex-M profile processor architectures obey the following rules and configure the 
Configuration and Control Register (CCR) appropriately:
• use word transfers only to access registers in the NVIC and System Control Space (SCS). 
• treat all unused SCS registers and register fields on the processor as Do-Not-Modify. 










