User guide

ManualsBrandsARM ManualsComputer equipmentCortex-M3

Programmers Model

ID072410 Non-Confidential

• Any load or store that generates an address dependent on the result of a proceeding data

processing operation will stall the pipeline for an additional cycle whilst the register bank

is updated. There is no forwarding path for this scenario.

•

LDR Rx,[PC,#imm]

might add a cycle because of contention with the fetch unit.

•

TBB

and

TBH

are also blocking operations. These are at least two cycles for the load, one

cycle for the add, and three cycles for the pipeline reload. This means at least six cycles,

or more if stalled on the load or the fetch.

•

LDR [any]

are pipelined when possible. This means that if the next instruction is an

LDR

STR

, and the destination of the first

LDR

is not used to compute the address for the next

instruction, then one cycle is removed from the cost of the next instruction. So, an

LDR

might be followed by an

STR

, so that the

STR

writes out what the

LDR

loaded. More multiple

LDR

s can be pipelined together. Some optimized examples are:

—

LDR R0,[R1]; LDR R1,[R2]

- normally three cycles total

—

LDR R0,[R1,R2]; STR R0,[R3,#20]

- normally three cycles total

—

LDR R0,[R1,R2]; STR R1,[R3,R2]

- normally three cycles total

—

LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4]

- normally four cycles total.

• Other instructions cannot be pipelined after

STR

with register offset.

STR

can only be

pipelined when it follows an

LDR

, but nothing can be pipelined after the store. Even a

stalled

STR

normally only takes two cycles, because of the write buffer.

•

LDREX

and

STREX

can be pipelined exactly as

LDR

. Because

STREX

is treated more like an

LDR

it can be pipelined as explained for

LDR

. Equally

LDREX

is treated exactly as an

LDR

and so

can be pipelined.

•

LDRD

and

STRD

cannot be pipelined with preceding or following instructions. However, the

two words are pipelined together. So, this operation requires three cycles when not stalled.

•

LDM

and

STM

cannot be pipelined with preceding or following instructions. However, all

elements after the first are pipelined together. So, a three element

LDM

takes 2+1+1 or 5

cycles when not stalled. Similarly, an eight element store takes nine cycles when not

stalled. When interrupted,

LDM

and

STM

instructions continue from where they left off when

returned to. The continue operation adds one or two cycles to the first element when

started.

• Unaligned word or halfword loads or stores add penalty cycles. A byte aligned halfword

load or store adds one extra cycle to perform the operation as two bytes. A halfword

aligned word load or store adds one extra cycle to perform the operation as two halfwords.

A byte-aligned word load or store adds two extra cycles to perform the operation as a byte,

a halfword, and a byte. These numbers increase if the memory stalls. A

STR

STRH

cannot

delay the processor because of the write buffer.

3.3.3 Binary compatibility with other Cortex processors

The processor implements a binary compatible subset of the instruction set and features

provided by other Cortex-M profile processors. You can move software, including system level

software, from the Cortex-M3 processor to other Cortex-M profile processors.

To ensure a smooth transition, ARM recommends that code designed to operate on other

Cortex-M profile processor architectures obey the following rules and configure the

Configuration and Control Register (CCR) appropriately:

• use word transfers only to access registers in the NVIC and System Control Space (SCS).

• treat all unused SCS registers and register fields on the processor as Do-Not-Modify.