Car Speaker User Manual

ManualsBrandsTexas Instruments ManualsCar SpeakersTMS320C67X/C67X+ DSP

TMS320C67x/C67x+ DSP

CPU and Instruction Set

Reference Guide

Literature Number: SPRU733

May 2005

Summary of content (465 pages)

PAGE 1
TMS320C67x/C67x+ DSP CPU and Instruction Set Reference Guide Literature Number: SPRU733 May 2005
PAGE 2
IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
PAGE 3
Preface Read This First About This Manual The TMS320C6000™ digital signal processor (DSP) platform is part of the TMS320™ DSP family. The TMS320C62x™ DSP generation and the TMS320C64x™ DSP generation comprise fixed-point devices in the C6000™ DSP platform, and the TMS320C67x™ DSP generation comprises floating-point devices in the C6000 DSP platform. The TMS320C67x+™ DSP is an enhancement of the C67x™ DSP with added functionality and an expanded instruction set.
PAGE 4
Trademarks Related Documentation From Texas Instruments / Trademarks TMS320C672x DSP Peripherals Overview Reference Guide (literature number SPRU723) describes the peripherals available on the TMS320C672x DSPs. TMS320C6000 Technical Brief (literature number SPRU197) gives an introduction to the TMS320C62x and TMS320C67x DSPs, development tools, and third-party support.
PAGE 5
Contents Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Summarizes the features of the TMS320 family of products and presents typical applications. Describes the TMS320C67x DSP and lists their key features. 1.1 1.2 1.3 1.4 2 TMS320 DSP Family Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TMS320C6000 DSP Family Overview . . . . . .
PAGE 6
Contents 3 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 Describes the assembly language instructions of the TMS320C67x DSP. Also described are parallel operations, conditional operations, resource constraints, and addressing modes. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 vi Instruction Operation and Execution Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
Contents CLR (Clear a Bit Field) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-77 CMPEQ (Compare for Equality, Signed Integers) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-80 CMPEQDP (Compare for Equality, Double-Precision Floating-Point Values) . . . . . . . 3-82 CMPEQSP (Compare for Equality, Single-Precision Floating-Point Values) . . . . . . . . 3-84 CMPGT (Compare for Greater Than, Signed Integers) . . . . . . . . . . . .
PAGE 8
Contents MPYI (Multiply 32-Bit by 32-Bit Into 32-Bit Result) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MPYID (Multiply 32-Bit by 32-Bit Into 64-Bit Result) . . . . . . . . . . . . . . . . . . . . . . . . . . . MPYLH (Multiply Signed 16 LSB by Signed 16 MSB) . . . . . . . . . . . . . . . . . . . . . . . . . . MPYLHU (Multiply Unsigned 16 LSB by Unsigned 16 MSB) . . . . . . . . . . . . . . . . . . . . MPYLSHU (Multiply Signed 16 LSB by Unsigned 16 MSB) . . . . . . . . . . . . . . . . . . . .
PAGE 9
Contents SPINT (Convert Single-Precision Floating-Point Value to Integer) . . . . . . . . . . . . . . . SPTRUNC (Convert Single-Precision Floating-Point Value to Integer With Truncation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SSHL (Shift Left With Saturation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SSUB (Subtract Two Signed Integers With Saturation) . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents 4.3 4.4 5 4-29 4-30 4-31 4-32 4-33 4-33 4-34 4-40 4-48 4-52 4-56 4-56 4-58 4-60 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Describes the TMS320C67x DSP interrupts, including reset and nonmaskable interrupts (NMI), and explains interrupt control, detection, and processing. 5.1 5.2 5.3 5.4 5.5 5.6 x 4.2.11 MPYI Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 11
Contents A Instruction Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Lists the instructions that are common to the C62x, C64x, and C67x DSPs. B Mapping Between Instruction and Functional Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Lists the instructions that execute on each functional unit. C .D Unit Instructions and Opcode Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 12
Figures Figures 1−1 2−1 2−2 2−3 2−4 2−5 2−6 2−7 2−8 2−9 2−10 2−11 2−12 2−13 2−14 2−15 2−16 3−1 3−2 3−3 3−4 4−1 4−2 4−3 4−4 4−5 4−6 4−7 4−8 4−9 4−10 4−11 4−12 4−13 4−14 4−15 4−16 4−17 xii TMS320C67x DSP Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 TMS320C67x CPU Data Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Storage Scheme for 40-Bit Data in a Register Pair . . . . . . . . . .
PAGE 13
Figures 4−18 4−19 4−20 4−21 4−22 4−23 4−24 4−25 4−26 4−27 4−28 4−29 4−30 4−31 4−32 4−33 4−34 5−1 5−2 5−3 5−4 5−5 C−1 C−2 C−3 C−4 D−1 D−2 D−3 E−1 E−2 E−3 F−1 F−2 F−3 F−4 F−5 F−6 F−7 F−8 F−9 F−10 F−11 G−1 G−2 G−3 Two-Cycle DP Instruction Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24 Four-Cycle Instruction Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 INTDP Instruction Phases . . . . . .
PAGE 14
Tables Tables 1−1 2−1 2−2 2−3 2−4 2−5 2−6 2−7 2−8 2−9 2−10 2−11 2−12 2−13 2−14 2−15 2−16 3−1 3−2 3−3 3−4 3−5 3−6 3−7 3−8 3−9 3−10 3−11 3−12 3−13 3−14 3−15 3−16 3−17 3−18 xiv Typical Applications for the TMS320 DSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 40-Bit/64-Bit Register Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 Functional Units and Operations Performed . . . . . . . . . . . . . . . .
PAGE 15
Tables 3−19 3−20 3−21 4−1 4−2 4−3 4−4 4−5 4−6 4−7 4−8 4−9 4−10 4−11 4−12 4−13 4−14 4−15 4−16 4−17 4−18 4−19 4−20 4−21 4−22 4−23 4−24 4−25 4−26 4−27 4−28 4−29 4−30 4−31 4−32 4−33 4−34 4−35 4−36 4−37 4−38 4−39 4−40 4−41 Data Types Supported by LDH(U) Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-131 Data Types Supported by LDH(U) Instruction (15-Bit Offset) . . . . . . . . . . . . . . . . . . . . . . 3-135 Register Addresses for Accessing the Control Registers . . . . .
PAGE 16
Tables 5−1 5−2 A−1 B−1 C−1 C−2 C−3 D−1 D−2 E−1 E−2 F−1 F−2 G−1 G−2 xvi Interrupt Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 Interrupt Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 Instruction Compatibility Between C62x, C64x, C67x, and C67x+ DSPs . . . . . . . . . . . . . . A-1 Functional Unit to Instruction Mapping . . . . . . . .
PAGE 17
Examples Examples 3−1 3−2 3−3 3−4 3−5 4−1 4−2 5−1 5−2 5−3 5−4 5−5 5−6 5−7 5−8 5−9 5−10 5−11 5−12 5−13 5−14 5−15 5−16 Fully Serial p-Bit Pattern in a Fetch Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 Fully Parallel p-Bit Pattern in a Fetch Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 Partially Serial p-Bit Pattern in a Fetch Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 18
Chapter 1a Introduction The TMS320C6000™ digital signal processor (DSP) platform is part of the TMS320™ DSP family. The TMS320C62x™ DSP generation and the TMS320C64x™ DSP generation comprise fixed-point devices in the C6000™ DSP platform, and the TMS320C67x™ DSP generation comprises floatingpoint devices in the C6000 DSP platform.
PAGE 19
TMS320 TMS320 DSP DSP Family Family Overview Overview / TMS320C6000 DSP Family Overview 1.1 TMS320 DSP Family Overview The TMS320™ DSP family consists of fixed-point, floating-point, and multiprocessor digital signal processors (DSPs). TMS320™ DSPs have an architecture designed specifically for real-time signal processing. Table 1−1 lists some typical applications for the TMS320™ family of DSPs. The TMS320™ DSPs offer adaptable approaches to traditional signal-processing problems.
PAGE 20
TMS320C6000 DSP Family Overview Table 1−1.
PAGE 21
TMS320C67x DSP Features and Options 1.3 TMS320C67x DSP Features and Options The C6000 devices execute up to eight 32-bit instructions per cycle. The C67x CPU consists of 32 general-purpose 32-bit registers and eight functional units.
PAGE 22
TMS320C67x DSP Features and Options 40-bit arithmetic options add extra precision for vocoders and other computationally intensive applications Saturation and normalization provide support for key arithmetic operations Field manipulation and instruction extract, set, clear, and bit counting support common operation found in control and data manipulation applications.
PAGE 23
TMS320C67x DSP Features and Options The VelociTI architecture of the C6000 platform of devices make them the first off-the-shelf DSPs to use advanced VLIW to achieve high performance through increased instruction-level parallelism. A traditional VLIW architecture consists of multiple execution units running in parallel, performing multiple instructions during a single clock cycle.
PAGE 24
TMS320C67x DSP Architecture 1.4 TMS320C67x DSP Architecture Figure 1−1 is the block diagram for the C67x DSP. The C6000 devices come with program memory, which, on some devices, can be used as a program cache. The devices also have varying sizes of data memory. Peripherals such as a direct memory access (DMA) controller, power-down logic, and external memory interface (EMIF) usually come with the CPU, while peripherals such as serial ports and host ports are on only certain devices.
PAGE 25
TMS320C67x DSP Architecture 1.4.1 Central Processing Unit (CPU) The C67x CPU, in Figure 1−1, is common to all the C62x/C64x/C67x devices.
PAGE 26
TMS320C67x DSP Architecture DMA Controller (C6701 DSP only) transfers data between address ranges in the memory map without intervention by the CPU. The DMA controller has four programmable channels and a fifth auxiliary channel. EDMA Controller performs the same functions as the DMA controller. The EDMA has 16 programmable channels, as well as a RAM space to hold multiple configurations for future transfers.
PAGE 27
Chapter 2 CPU Data Paths and Control This chapter focuses on the CPU, providing information about the data paths and control registers. The two register files and the data cross paths are described. Topic SPRU733 Page 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.2 General-Purpose Register Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.3 Functional Units . . . . . . . . . . . . . . . . . . . .
PAGE 28
Introduction Introduction / General-Purpose Register Files 2.1 Introduction The components of the data path for the TMS320C67x CPU are shown in Figure 2−1. These components consist of: Two general-purpose register files (A and B) Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2) Two load-from-memory data paths (LD1 and LD2) Two store-to-memory data paths (ST1 and ST2) Two data address paths (DA1 and DA2) Two register file data cross paths (1X and 2X) 2.
PAGE 29
General-Purpose Register Files Figure 2−1. TMS320C67x CPU Data Paths ÁÁÁÁ ÁÁÁÁ Á Á ÁÁÁÁ ÁÁ ÁÁÁÁ Á ÁÁÁÁ ÁÁÁÁÁ Á ÁÁÁÁ Á ÁÁÁÁ Á ÁÁÁÁ Á ÁÁÁÁ Á Á ÁÁÁÁÁÁ ÁÁÁÁÁ ÁÁÁÁÁÁ Á Á ÁÁÁÁ ÁÁÁÁ Á Á ÁÁÁÁÁ ÁÁÁÁÁ Á ÁÁÁÁ Á ÁÁÁÁ ÁÁÁÁ Á Á ÁÁÁÁ ÁÁÁÁÁÁ Á Á ÁÁÁÁ Á ÁÁÁÁÁ ÁÁÁÁ ÁÁÁÁÁ Á ÁÁÁÁ ÁÁÁÁÁ ÁÁÁÁ ÁÁÁÁ Á Á ÁÁÁÁ Á ÁÁÁÁÁÁ src1 .L1 src2 dst long dst long src LD1 32 MSB ST1 Data path A long src long dst dst .S1 src1 8 8 8 32 32 8 src2 .M1 dst src1 src2 LD1 32 LSB DA1 DA2 .D1 .
PAGE 30
General-Purpose Register Files Table 2−1. 40-Bit/64-Bit Register Pairs Register Files A B Devices A1:A0 B1:B0 C67x DSP A3:A2 B3:B2 A5:A4 B5:B4 A7:A6 B7:B6 A9:A8 B9:B8 A11:A10 B11:B10 A13:A12 B13:B12 A15:A14 B15:B14 A17:A16 B17:B16 A19:A18 B19:B18 A21:A20 B21:B20 A23:A22 B23:B22 A25:A24 B25:B24 A27:A26 B27:B26 A29:A28 B29:B28 A31:A30 B31:B30 C67x+ DSP only Figure 2−2.
PAGE 31
Functional Units 2.3 Functional Units The eight functional units in the C6000 data paths can be divided into two groups of four; each functional unit in one data path is almost identical to the corresponding unit in the other data path. The functional units are described in Table 2−2. Most data lines in the CPU support 32-bit operands, and some support long (40-bit) and double word (64-bit) operands.
PAGE 32
Register Register File File Cross Cross Paths / Memory, Load, and Store Paths 2.4 Register File Cross Paths Each functional unit reads directly from and writes directly to the register file within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to register file B. The register files are connected to the opposite-side register file’s functional units via the 1X and 2X cross paths.
PAGE 33
Data Address Paths Data Address Paths / Control Register File 2.6 Data Address Paths The data address paths (DA1 and DA2) are each connected to the .D units in both data paths. This allows data addresses generated by any one path to access data to or from any register. The DA1 and DA2 resources and their associated data paths are specified as T1 and T2, respectively. T1 consists of the DA1 address path and the LD1 and ST1 data paths.
PAGE 34
Control Register File 2.7.1 Register Addresses for Accessing the Control Registers Table 2−4 lists the register addresses for accessing the control register file. One unit (.S2) can read from and write to the control register file. Each control register is accessed by the MVC instruction. See the MVC instruction description, page 3-180, for information on how to use this instruction. Additionally, some of the control register bits are specially accessed in other ways.
PAGE 35
Control Register File 2.7.2 Pipeline/Timing of Control Register Accesses All MVC instructions are single-cycle instructions that complete their access of the explicitly named registers in the E1 pipeline phase. This is true whether MVC is moving a general register to a control register, or conversely. In all cases, the source register content is read, moved through the .S2 unit, and written to the destination register in the E1 pipeline phase. Pipeline Stage E1 Read src2 Written dst Unit in use .
PAGE 36
Control Register File 2.7.3 Addressing Mode Register (AMR) For each of the eight registers (A4–A7, B4–B7) that can perform linear or circular addressing, the addressing mode register (AMR) specifies the addressing mode. A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode. With circular addressing, the field also specifies which BK (block size) field to use for a circular buffer.
PAGE 37
Control Register File Table 2−5. Addressing Mode Register (AMR) Field Descriptions (Continued) Bit 13−12 11−10 9−8 7−6 5−4 SPRU733 Field Value Description B6 MODE 0−3h Address mode selection for register file B6. B5 MODE B4 MODE A7 MODE A6 MODE 0 Linear modification (default at reset) 1h Circular addressing using the BK0 field 2h Circular addressing using the BK1 field 3h Reserved 0−3h Address mode selection for register file B5.
PAGE 38
Control Register File Table 2−5. Addressing Mode Register (AMR) Field Descriptions (Continued) Bit Field Value Description 3−2 A5 MODE 0−3h Address mode selection for register file a5. 1−0 A4 MODE 0 Linear modification (default at reset) 1h Circular addressing using the BK0 field 2h Circular addressing using the BK1 field 3h Reserved 0−3h Address mode selection for register file A4.
PAGE 39
Control Register File 2.7.4 Control Status Register (CSR) The control status register (CSR) contains control and status bits. The CSR is shown in Figure 2−4 and described in Table 2−7. For the PWRD, EN, PCC, and DCC fields, see the device-specific data manual to see if it supports the options that these fields control. The power-down modes and their wake-up methods are programmed by the PWRD field (bits 15−10) of CSR. The PWRD field of CSR is shown in Figure 2−5.
PAGE 40
Control Register File Table 2−7. Control Status Register (CSR) Field Descriptions Bit 31−24 Field Value Description CPU ID 0−FFh Identifies the CPU of the device. Not writable by the MVC instruction. 0−1h Reserved 2h C67x CPU 3h C67x+ CPU 4h−FFh Reserved 23−16 REVISION ID 0−FFh Identifies silicon revision of the CPU. For the most current silicon revision information, see the device-specific data manual. Not writable by the MVC instruction. 15−10 PWRD 0−3Fh Power-down mode field.
PAGE 41
Control Register File Table 2−7. Control Status Register (CSR) Field Descriptions (Continued) Bit Field Value Description 7−5 PCC 0−7h Program cache control mode. Writable by the MVC instruction. See the TMS320C621x/C671x DSP Two-Level Internal Memory Reference Guide (SPRU609). 4−2 DCC 0 Direct-mapped cache enabled 1h Reserved 2h Direct-mapped cache enabled 3h−7h Reserved 0−7h Data cache control mode. Writable by the MVC instruction.
PAGE 42
Control Register File 2.7.5 Interrupt Clear Register (ICR) The interrupt clear register (ICR) allows you to manually clear the maskable interrupts (INT15−INT4) in the interrupt flag register (IFR). Writing a 1 to any of the bits in ICR causes the corresponding interrupt flag (IFn) to be cleared in IFR. Writing a 0 to any bit in ICR has no effect. Incoming interrupts have priority and override any write to ICR. You cannot set any bit in ICR to affect NMI or reset.
PAGE 43
Control Register File 2.7.6 Interrupt Enable Register (IER) The interrupt enable register (IER) enables and disables individual interrupts. The IER is shown in Figure 2−7 and described in Table 2−9. Figure 2−7.
PAGE 44
Control Register File 2.7.7 Interrupt Flag Register (IFR) The interrupt flag register (IFR) contains the status of INT4−INT15 and NMI interrupt. Each corresponding bit in the IFR is set to 1 when that interrupt occurs; otherwise, the bits are cleared to 0. If you want to check the status of interrupts, use the MVC instruction to read the IFR. (See the MVC instruction description, page 3-180, for information on how to use this instruction.) The IFR is shown in Figure 2−8 and described in Table 2−10.
PAGE 45
Control Register File 2.7.8 Interrupt Return Pointer Register (IRP) The interrupt return pointer register (IRP) contains the return pointer that directs the CPU to the proper location to continue program execution after processing a maskable interrupt. A branch using the address in IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete. The IRP is shown in Figure 2−9.
PAGE 46
Control Register File 2.7.9 Interrupt Set Register (ISR) The interrupt set register (ISR) allows you to manually set the maskable interrupts (INT15−INT4) in the interrupt flag register (IFR). Writing a 1 to any of the bits in ISR causes the corresponding interrupt flag (IFn) to be set in IFR. Writing a 0 to any bit in ISR has no effect. You cannot set any bit in ISR to affect NMI or reset. The ISR is shown in Figure 2−10 and described in Table 2−11.
PAGE 47
Control Register File 2.7.10 Interrupt Service Table Pointer Register (ISTP) The interrupt service table pointer register (ISTP) is used to locate the interrupt service routine (ISR). The ISTB field identifies the base portion of the address of the interrupt service table (IST) and the HPEINT field identifies the specific interrupt and locates the specific fetch packet within the IST. The ISTP is shown in Figure 2−11 and described in Table 2−12. See section 5.1.2.
PAGE 48
Control Register File 2.7.11 Nonmaskable Interrupt (NMI) Return Pointer Register (NRP) The NMI return pointer register (NRP) contains the return pointer that directs the CPU to the proper location to continue program execution after NMI processing. A branch using the address in NRP (B NRP) in your interrupt service routine returns to the program flow when NMI servicing is complete. The NRP is shown in Figure 2−12.
PAGE 49
Control Register File Extensions 2.8 Control Register File Extensions The C67x DSP has three additional configuration registers to support floatingpoint operations. The registers specify the desired floating-point rounding mode for the .L and .M units. They also contain fields to warn if src1 and src2 are NaN or denormalized numbers, and if the result overflows, underflows, is inexact, infinite, or invalid.
PAGE 50
Control Register File Extensions Figure 2−14.
PAGE 51
Control Register File Extensions Table 2−14. Floating-Point Adder Configuration Register (FADCR) Field Descriptions (Continued) Bit Field 20 INVAL 19 18 17 16 Value 0 A signed NaN (SNaN) is not a source. 1 A signed NaN (SNaN) is a source. NaN is a source in a floating-point to integer conversion or when infinity is subtracted from infinity. DEN2 Denormalized number select for .L2 src2. 0 src2 is not a denormalized number. 1 src2 is a denormalized number.
PAGE 52
Control Register File Extensions Table 2−14. Floating-Point Adder Configuration Register (FADCR) Field Descriptions (Continued) Bit Field 7 INEX Value Description Inexact results status for .L1. 0 1 6 5 4 3 2 1 0 2-26 OVER Result differs from what would have been computed had the exponent range and precision been unbounded; never set with INVAL. Result overflow status for .L1. 0 Result does not overflow. 1 Result overflows. INFO Signed infinity for .L1.
PAGE 53
Control Register File Extensions 2.8.2 Floating-Point Auxiliary Configuration Register (FAUCR) The floating-point auxiliary register (FAUCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .S functional units. FAUCR has a set of fields specific to each of the .S units: .S2 uses bits 31−16 and .S1 uses bits 15−0. FAUCR is shown in Figure 2−15 and described in Table 2−15.
PAGE 54
Control Register File Extensions Table 2−15. Floating-Point Auxiliary Configuration Register (FAUCR) Field Descriptions (Continued) Bit Field 25 UNORD 24 23 Value Description Source to a compare operation for .S2 0 NaN is not a source to a compare operation. 1 NaN is a source to a compare operation. UND Result underflow status for .S2. 0 Result does not underflow. 1 Result underflows. INEX Inexact results status for .S2.
PAGE 55
Control Register File Extensions Table 2−15. Floating-Point Auxiliary Configuration Register (FAUCR) Field Descriptions (Continued) Bit Field 17 NAN2 16 15−11 10 9 8 7 Value NaN select for .S2 src2. 0 src2 is not NaN. 1 src2 is NaN. NAN1 Reserved Description NaN select for .S2 src1. 0 src1 is not NaN. 1 src1 is NaN. 0 Reserved. The reserved bit location is always read as 0. A value written to this field has no effect. DIV0 Source to reciprocal operation for .S1.
PAGE 56
Control Register File Extensions Table 2−15. Floating-Point Auxiliary Configuration Register (FAUCR) Field Descriptions (Continued) Bit Field 5 INFO 4 3 2 1 0 2-30 Value Description Signed infinity for .S1. 0 Result is not signed infinity. 1 Result is signed infinity. 0 A signed NaN (SNaN) is not a source. 1 A signed NaN (SNaN) is a source. NaN is a source in a floating-point to integer conversion or when infinity is subtracted from infinity.
PAGE 57
Control Register File Extensions 2.8.3 Floating-Point Multiplier Configuration Register (FMCR) The floating-point multiplier configuration register (FMCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .M functional units. FMCR has a set of fields specific to each of the .M units: .M2 uses bits 31−16 and .M1 uses bits 15−0. FMCR is shown in Figure 2−16 and described in Table 2−16. Figure 2−16.
PAGE 58
Control Register File Extensions Table 2−16. Floating-Point Multiplier Configuration Register (FMCR) Field Descriptions (Continued) Bit Field 23 INEX Value Description Inexact results status for .M2. 0 1 22 21 20 19 18 17 16 2-32 OVER Result differs from what would have been computed had the exponent range and precision been unbounded; never set with INVAL. Result overflow status for .M2. 0 Result does not overflow. 1 Result overflows. INFO Signed infinity for .M2.
PAGE 59
Control Register File Extensions Table 2−16. Floating-Point Multiplier Configuration Register (FMCR) Field Descriptions (Continued) Bit Field Value 15−11 Reserved 0 10−9 RMODE 0−3h 8 7 Description Reserved. The reserved bit location is always read as 0. A value written to this field has no effect. Rounding mode select for .M1.
PAGE 60
Control Register File Extensions Table 2−16. Floating-Point Multiplier Configuration Register (FMCR) Field Descriptions (Continued) Bit Field 2 DEN1 1 0 2-34 Value Description Denormalized number select for .M1 src1. 0 src1 is not a denormalized number. 1 src1 is a denormalized number. NAN2 NaN select for .M1 src2. 0 src2 is not NaN. 1 src2 is NaN. NAN1 NaN select for .M1 src1. 0 src1 is not NaN. 1 src1 is NaN.
PAGE 61
Chapter 3 Instruction Set This chapter describes the assembly language instructions of the TMS320C67x DSP. Also described are parallel operations, conditional operations, resource constraints, and addressing modes. The C67x floating-point DSP uses all of the instructions available to the TMS320C62x™ DSP but it also uses other instructions that are specific to the C67x DSP.
PAGE 62
Instruction Operation and Execution Notations 3.1 Instruction Operation and Execution Notations Table 3−1 explains the symbols used in the instruction descriptions. Table 3−1.
PAGE 63
Instruction Operation and Execution Notations Table 3−1.
PAGE 64
Instruction Operation and Execution Notations Table 3−1.
PAGE 65
Instruction Operation and Execution Notations Table 3−1.
PAGE 66
Instruction Operation and Execution Notations Table 3−1.
PAGE 67
Instruction Syntax and Opcode Notations 3.2 Instruction Syntax and Opcode Notations Table 3−2 explains the syntaxes and opcode fields used in the instruction descriptions. The C64x CPU 32-bit opcodes are mapped in Appendix C through Appendix G. Table 3−2. Instruction Syntax and Opcode Notations Symbol Meaning baseR base address register CC creg 3-bit field specifying a conditional register, see section 3.
PAGE 68
Instruction Syntax and Opcode Notations Table 3−2. Instruction Syntax and Opcode Notations (Continued) Symbol Meaning scstn bit n of the signed constant field sn sign src source src1 source 1 src2 source 2 srcms stgn bit n of the constant stg t side of source/destination (src/dst) register; 0 = side A, 1 = side B ucstn n-bit unsigned constant field ucstn bit n of the unsigned constant field unit unit decode x cross path for src2; 0 = do not use cross path, 1 = use cross path y .
PAGE 69
Overview of IEEE Standard Single- and Double-Precision Formats 3.3 Overview of IEEE Standard Single- and Double-Precision Formats Floating-point operands are classified as single-precision (SP) and doubleprecision (DP). Single-precision floating-point values are 32-bit values stored in a single register. Double-precision floating-point values are 64-bit values stored in a register pair. The register pair consists of consecutive even and odd registers from the same register file.
PAGE 70
Overview of IEEE Standard Single- and Double-Precision Formats Table 3−3.
PAGE 71
Overview of IEEE Standard Single- and Double-Precision Formats Figure 3−1 shows the fields of a single-precision floating-point number represented within a 32-bit register. Figure 3−1. Single-Precision Floating-Point Fields 23 22 31 30 0 s e Legend: s e f sign bit (0 = positive, 1 = negative) 8-bit exponent ( 0 < e < 255) 23-bit fraction 0 < f < 1*2−1 + 1*2−2 + ...
PAGE 72
Overview of IEEE Standard Single- and Double-Precision Formats Table 3−5 shows hexadecimal and decimal values for some single-precision floating-point numbers. Figure 3−2 shows the fields of a double-precision floating-point number represented within a pair of 32-bit registers. Table 3−5. Hexadecimal and Decimal Representation for Selected Single-Precision Values Symbol Hex Value Decimal Value NaN_out 7FFF FFFF QNaN 0 0000 0000 0.0 −0 8000 0000 −0.0 1 3F80 0000 1.0 2 4000 0000 2.
PAGE 73
Overview of IEEE Standard Single- and Double-Precision Formats Normalized: −1s × 2(e−1023) × 1.f 0 < e < 2047 Denormalized (Subnormal): −1s × 2−1022 × 0.f e = 0; f nonzero Table 3−6 shows the s, e, and f values for special double-precision floatingpoint numbers. Table 3−6. Special Double-Precision Values Symbol Sign (s) Exponent (e) Fraction (f) +0 0 0 0 −0 1 0 0 +Inf 0 2047 0 −Inf 1 2047 0 NaN x 2047 nonzero QNaN x 2047 1xx..x SNaN x 2047 0xx..
PAGE 74
Delay Slots 3.4 Delay Slots The execution of floating-point instructions can be defined in terms of delay slots and functional unit latency. The number of delay slots is equivalent to the number of additional cycles required after the source operands are read for the result to be available for reading. For a single-cycle type instruction, operands are read on cycle i and produce a result that can be read on cycle i + 1.
PAGE 75
Delay Slots Table 3−8.
PAGE 76
Parallel Operations 3.5 Parallel Operations Instructions are always fetched eight at a time. This constitutes a fetch packet. The basic format of a fetch packet is shown in Figure 3−3. Fetch packets are aligned on 256-bit (8-word) boundaries. Figure 3−3.
PAGE 77
Parallel Operations Example 3−1. Fully Serial p-Bit Pattern in a Fetch Packet This p-bit pattern: 31 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 0 0 0 0 0 0 0 0 Instruction A Instruction B Instruction C Instruction D Instruction E Instruction F Instruction G Instruction H results in this execution sequence: Cycle/Execute Packet Instructions 1 A 2 B 3 C 4 D 5 E 6 F 7 G 8 H The eight instructions are executed sequentially. Example 3−2.
PAGE 78
Parallel Operations Example 3−3. Partially Serial p-Bit Pattern in a Fetch Packet This p-bit pattern: 31 Instruction A 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 0 0 1 1 0 1 1 0 Instruction B Instruction C Instruction D Instruction E Instruction F Instruction G Instruction H results in this execution sequence: Cycle/Execute Packet Note: 3.5.
PAGE 79
Conditional Operations 3.6 Conditional Operations Most instructions can be conditional. The condition is controlled by a 3-bit opcode field (creg) that specifies the condition register tested, and a 1-bit field (z) that specifies a test for zero or nonzero. The four MSBs of every opcode are creg and z. The specified condition register is tested at the beginning of the E1 pipeline stage for all instructions. For more information on the pipeline, see Chapter 4.
PAGE 80
Resource Constraints 3.7 Resource Constraints No two instructions within the same execute packet can use the same resources. Also, no two instructions can write to the same register during the same cycle. The following sections describe how an instruction can use each of the resources. 3.7.1 Constraints on Instructions Using the Same Functional Unit Two instructions using the same functional unit cannot be issued in the same execute packet. The following execute packet is invalid: || ADD .
PAGE 81
Resource Constraints 3.7.3 Constraints on Cross Paths (1X and 2X) One unit (either a .S, .L, or .M unit) per data path, per execute packet, can read a source operand from its opposite register file via the cross paths (1X and 2X). For example, the .S1 unit can read both its operands from the A register file; or it can read an operand from the B register file using the 1X cross path and the other from the A register file.
PAGE 82
Resource Constraints 3.7.4 Constraints on Loads and Stores Load and store instructions can use an address pointer from one register file while loading to or storing from the other register file. Two load and store instructions using a destination/source from the same register file cannot be issued in the same execute packet. The address register must be on the same side as the .D unit used. The following execute packet is invalid: LDW.D1 || LDW.D2 *A0,A1 ; \ .
PAGE 83
Resource Constraints 3.7.5 Constraints on Long (40-Bit) Data Because the .S and .L units share a read register port for long source operands and a write register port for long results, only one long result may be issued per register file in an execute packet. All instructions with a long result on the .S and .L units have zero delay slots. See section 2.2 for the order for long pairs. The following execute packet is invalid: ADD.L1 || SHL.
PAGE 84
Resource Constraints 3.7.6 Constraints on Register Reads More than four reads of the same register cannot occur on the same cycle. Conditional registers are not included in this count. The following execute packets are invalid: MPY .M1 A1, A1, A4 || ADD .L1 A1, A1, A5 || SUB .D1 A1, A2, A3 MPY .M1 A1, A1, A4 || ADD .L1 A1, A1, A5 || SUB .D2x A1, B2, B3 ; five reads of register A1 ; five reads of register A1 The following execute packet is valid: 3-24 Instruction Set MPY .
PAGE 85
Resource Constraints 3.7.7 Constraints on Register Writes Two instructions cannot write to the same register on the same cycle. Two instructions with the same destination can be scheduled in parallel as long as they do not write to the destination register on the same cycle. For example, an MPY issued on cycle i followed by an ADD on cycle i + 1 cannot write to the same register because both instructions write a result on cycle i + 1.
PAGE 86
Resource Constraints 3.7.8 Constraints on Floating-Point Instructions If an instruction has a multicycle functional unit latency, it locks the functional unit for the necessary number of cycles. Any new instruction dispatched to that functional unit during this locking period causes undefined results. If an instruction with a multicycle functional unit latency has a condition that is evaluated as false during E1, it still locks the functional unit for subsequent cycles.
PAGE 87
Resource Constraints MPYDP No other instruction on the same side can use the cross path on cycles i, i + 1, i + 2, and i + 3. MPYSPDP No other instruction on the same side can use the cross path on cycles i and i + 1. Other hazards exist because instructions have varying numbers of delay slots, and need the functional unit read and write ports of varying numbers of cycles.
PAGE 88
Resource Constraints 3-28 Instruction Set MPYI A 4-cycle instruction cannot be scheduled on that functional unit on cycle i + 4, i + 5, or i + 6. A MPYDP instruction cannot be scheduled on that functional unit on cycle i + 4, i + 5, or i + 6. A MPYSPDP instruction cannot be scheduled on that functional unit on cycle i + 4, i + 5, or i + 6. A MPYSP2DP instruction cannot be scheduled on that functional unit on cycle i + 4, i + 5, or i + 6.
PAGE 89
Resource Constraints MPYSPDP A 4-cycle instruction cannot be scheduled on that functional unit on cycle i + 2 or i + 3. A MPYI instruction cannot be scheduled on that functional unit on cycle i + 2 or i + 3. A MPYID instruction cannot be scheduled on that functional unit on cycle i + 2 or i + 3. A MPYDP instruction cannot be scheduled on that functional unit on cycle i + 2 or i + 3. A MPYSP2DP instruction cannot be scheduled on that functional unit on cycle i + 2 or i + 3.
PAGE 90
Addressing Modes 3.8 Addressing Modes The addressing modes on the C67x DSP are linear, circular using BK0, and circular using BK1. The addressing mode is specified by the addressing mode register (AMR), described in section 2.7.3. All registers can perform linear addressing. Only eight registers can perform circular addressing: A4−A7 are used by the .D1 unit and B4−B7 are used by the .D2 unit. No other units can perform circular addressing.
PAGE 91
Addressing Modes 3.8.2 Circular Addressing Mode The BK0 and BK1 fields in AMR specify the block sizes for circular addressing, see section 2.7.3. 3.8.2.1 LD and ST Instructions As with linear address arithmetic, offsetR/cst is shifted left by 3, 2, 1, or 0 according to the data size, and is then added to or subtracted from baseR to produce the final address.
PAGE 92
Addressing Modes 3.8.2.2 ADDA and SUBA Instructions As with linear address arithmetic, offsetR/cst is shifted left by 3, 2, 1, or 0 according to the data size, and is then added to or subtracted from baseR to produce the final address. Circular addressing modifies this slightly by only allowing bits N through 0 of the result to be updated, leaving bits 31 through N + 1 unchanged after address arithmetic. The resulting address is bounded to 2(N + 1) range, regardless of the size of the offsetR/cst.
PAGE 93
Addressing Modes Table 3−10.
PAGE 94
Instruction Instruction Compatibility Compatibility / Instruction Descriptions 3.9 Instruction Compatibility The C62x, C64x, and C67x DSPs share an instruction set. All of the instructions valid for the C62x DSP are also valid for the C67x DSP. See Appendix A for a list of the instructions that are common to the C62x, C64x, and C67x DSPs. 3.10 Instruction Descriptions This section gives detailed information on the instruction set.
PAGE 95
The way each instruction is described Example Syntax Example The way each instruction is described. EXAMPLE (.unit) src, dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 src and dst indicate source and destination, respectively. The (.unit) dictates which functional unit the instruction is mapped to (.L1, .L2, .S1, .S2, .M1, .M2, .D1, or .D2). A table is provided for each instruction that gives the opcode map fields, units the instruction is mapped to, types of operands, and the opcode.
PAGE 96
Example The way each instruction is described Table 3−12. Relationships Between Operands, Operand Size, Signed/Unsigned, Functional Units, and Opfields for Example Instruction (ADD) 3-36 Instruction Set Opcode map field used... For operand type... Unit Opfield src1 src2 dst sint xsint sint .L1, .L2 000 0011 src1 src2 dst sint xsint slong .L1, .L2 010 0011 src1 src2 dst xsint slong slong .L1, .L2 010 0001 src1 src2 dst scst5 xsint sint .L1, .
PAGE 97
The way each instruction is described Example Compatibility The C62x, C64x, and C67x DSPs share an instruction set. All of the instructions valid for the C62x DSP are also valid for the C67x DSP. This section identifies which DSP family the instruction is valid. Description Instruction execution and its effect on the rest of the processor or memory contents are described. Any constraints on the operands imposed by the processor or the assembler are discussed.
PAGE 98
ABS Absolute Value With Saturation Absolute Value With Saturation ABS ABS (.unit) src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 0 x op 1 7 Opcode map field used... For operand type... src2 dst src2 dst 5 4 3 2 1 1 1 0 s p 1 Unit Opfield xsint sint .L1, .L2 001 1010 slong slong .L1, L2 011 1000 Description The absolute value of src2 is placed in dst.
PAGE 99
Absolute Value With Saturation Instruction Type Single-cycle Delay Slots 0 See Also ABSDP, ABSSP Example 1 ABS .L1 A1,A5 Before instruction A1 8000 4E3Dh −2147463619 A5 xxxx xxxxh Example 2 ABS .
PAGE 100
ABSDP Absolute Value, Double-Precision Floating-Point Absolute Value, Double-Precision Floating-Point ABSDP ABSDP (.unit) src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 13 reserved 12 11 x 1 6 5 4 3 2 For operand type... src2 dst dp dp 0 0 1 1 0 0 1 0 0 0 s p 1 Opcode map field used... 1 1 1 Unit .S1, .S2 Description The absolute value of src2 is placed in dst.
PAGE 101
Absolute Value, Double-Precision Floating-Point Pipeline Pipeline Stage E1 ABSDP E2 src2_l src2_h Read dst_l Written dst_h .S Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
PAGE 102
ABSSP Absolute Value, Single-Precision Floating-Point Absolute Value, Single-Precision Floating-Point ABSSP ABSSP (.unit) src2, dst Syntax .unit = . S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 0 x 1 6 5 4 3 2 For operand type... src2 dst xsp sp Description The absolute value in src2 is placed in dst. Execution if (cond) else 0 1 1 1 0 0 1 0 0 0 s p 1 Opcode map field used... 1 1 1 Unit .
PAGE 103
Absolute Value, Single-Precision Floating-Point Pipeline Pipeline Stage E1 Read src2 Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 Functional Unit Latency 1 See Also ABS, ABSDP Example ABSSP .S1X B1,A5 Before instruction B1 c020 0000h A5 xxxx xxxxh SPRU733 ABSSP −2.5 1 cycle after instruction B1 c020 0000h −2.5 A5 4020 0000h 2.
PAGE 104
ADD Add Two Signed Integers Without Saturation Add Two Signed Integers Without Saturation ADD ADD (.unit) src1, src2, dst or ADD (.D1 or .D2) src2, src1, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 3-44 Instruction Set 4 3 2 1 0 1 1 0 s p 1 Opcode map field used... For operand type... Unit src1 src2 dst sint xsint sint .L1, .
PAGE 105
ADD Add Two Signed Integers Without Saturation .S unit Opcode 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 5 4 3 2 1 0 1 0 0 0 s p 1 Unit Opfield sint xsint sint .S1, .S2 00 0111 scst5 xsint sint .S1, .S2 00 0110 1 Description for .L1, .L2 and .S1, .S2 Opcodes src2 is added to src1. The result is placed in dst. Execution for .L1, .L2 and .S1, .
PAGE 106
ADD Add Two Signed Integers Without Saturation .D unit Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 01 0000 sint ucst5 sint .D1, .D2 01 0010 1 Description for .D1, .D2 Opcodes src1 is added to src2. The result is placed in dst. Execution for .D1, .
PAGE 107
Add Two Signed Integers Without Saturation Example 1 ADD .L2X A1,B1,B2 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah B1 FFFF FF12h −238 B1 FFFF FF12h B2 xxxx xxxxh Example 2 B2 0000 316Ch ADD .
PAGE 108
ADDAB Add Using Byte Addressing Mode Add Using Byte Addressing Mode ADDAB ADDAB (.unit) src2, src1, dst Syntax .unit = .D1 or .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 0000 sint ucst5 sint .D1, .
PAGE 109
Add Using Byte Addressing Mode Example 1 ADDAB .D1 ADDAB A4,A2,A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0103h AMR 0002 0001h AMR 0002 0001h BK0 = 2 → size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAB .D1X B14,42h,A4 Before instruction B14 Example 3 0020 1000h ADDAB .
PAGE 110
ADDAD Add Using Doubleword Addressing Mode Add Using Doubleword Addressing Mode ADDAD ADDAD (.unit) src2, src1, dst Syntax .unit = . D1 or .D2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Description Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 1100 sint ucst5 sint .D1, .
PAGE 111
ADDAD Add Using Doubleword Addressing Mode Instruction Type Single-cycle Delay Slots 0 Functional Unit Latency 1 See Also ADD, ADDAB, ADDAH, ADDAW Example ADDAD .
PAGE 112
ADDAH Add Using Halfword Addressing Mode Add Using Halfword Addressing Mode ADDAH ADDAH (.unit) src2, src1, dst Syntax .unit = .D1 or .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 0100 sint ucst5 sint .D1, .
PAGE 113
Add Using Halfword Addressing Mode Example 1 ADDAH .D1 ADDAH A4,A2,A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0106h AMR 0002 0001h AMR 0002 0001h BK0 = 2 → size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAH .D1X B14,42h,A4 Before instruction B14 Example 3 0020 1000h ADDAH .
PAGE 114
ADDAW Add Using Word Addressing Mode Add Using Word Addressing Mode ADDAW ADDAW (.unit) src2, src1, dst Syntax .unit = .D1 or .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 1000 sint ucst5 sint .D1, .
PAGE 115
Add Using Word Addressing Mode Example 1 ADDAW .D1 ADDAW A4,2,A4 Before instruction 1 cycle after instruction A4 0002 0000h A4 0002 0000h AMR 0002 0001h AMR 0002 0001h BK0 = 2 → size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAW .D1X B14,42h,A4 Before instruction B14 Example 3 0020 1000h ADDAW .
PAGE 116
ADDDP Add Two Double-Precision Floating-Point Values Add Two Double-Precision Floating-Point Values ADDDP ADDDP (.unit) src1, src2, dst .unit = .L1 or .L2 or ADDDP (.unit) src1, src2, dst .unit = .S1 or .S2 Syntax (C67x and C67x+ CPU) (C67x+ CPU only) C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type...
PAGE 117
Add Two Double-Precision Floating-Point Values ADDDP Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) If rounding is performed, the INEX bit is set. 3) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set, also. 4) If one source is +infinity and the other is −infinity, the result is NaN_out and the INVAL bit is set.
PAGE 118
ADDDP Add Two Double-Precision Floating-Point Values Pipeline Pipeline Stage Read E1 E2 E3 src1_l src2_l src1_h src2_h E4 Written Unit in use .L or .S E5 E6 E7 dst_l dst_h .L or .S For the C67x CPU, if dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
PAGE 119
ADDK Add Signed 16-Bit Constant to Register Add Signed 16-Bit Constant to Register ADDK ADDK (.unit) cst, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 7 creg z dst cst16 3 1 5 16 6 5 4 3 2 1 0 1 0 1 0 0 s p 1 Opcode map field used... For operand type... cst16 dst scst16 uint 1 Unit .S1, .S2 Description A 16-bit signed constant, cst16, is added to the dst register specified. The result is placed in dst.
PAGE 120
ADDSP Add Two Single-Precision Floating-Point Values Add Two Single-Precision Floating-Point Values ADDSP ADDSP (.unit) src1, src2, dst .unit = .L1 or .L2 or ADDSP (.unit) src1, src2, dst Syntax (C67x and C67x+ CPU) (C67x+ CPU only) .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type...
PAGE 121
Add Two Single-Precision Floating-Point Values ADDSP Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) If rounding is performed, the INEX bit is set. 3) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 4) If one source is +infinity and the other is −infinity, the result is NaN_out and the INVAL bit is set.
PAGE 122
ADDSP Add Two Single-Precision Floating-Point Values Pipeline Pipeline Stage E1 E2 E3 E4 src1 src2 Read dst Written Unit in use .L or .S Instruction Type 4-cycle Delay Slots 3 Functional Unit Latency 1 See Also ADD, ADDDP, ADDU, SUBSP Example ADDSP .L1 A1,A2,A3 Before instruction A1 C020 0000h −2.5 A1 C020 0000h −2.5 A2 4109 999Ah 8.6 A2 4109 999Ah 8.6 A3 40C3 3334h 6.
PAGE 123
ADDU Add Two Unsigned Integers Without Saturation Add Two Unsigned Integers Without Saturation ADDU ADDU (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 0 1 1 0 s p 1 .L1, .L2 010 1011 xuint ulong ulong .L1, .
PAGE 124
ADDU Add Two Unsigned Integers Without Saturation Example 1 ADDU .L1 A1,A2,A5:A4 Before instruction 1 cycle after instruction A1 0000 325Ah 12890† A1 0000 325Ah A2 FFFF FF12h 4294967058† A2 FFFF FF12h A5:A4 xxxx xxxxh † ‡ A5:A4 0000 0001h ADDU .
PAGE 125
ADD2 Add Two 16-Bit Integers on Upper and Lower Register Halves Add Two 16-Bit Integers on Upper and Lower Register Halves ADD2 ADD2 (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 Description 6 0 5 4 3 2 1 0 0 0 1 1 0 0 0 s p 1 Opcode map field used... For operand type... src1 src2 dst sint xsint sint 0 1 Unit .S1, .
PAGE 126
ADD2 Add Two 16-Bit Integers on Upper and Lower Register Halves if (cond) Execution { msb16(src1) + msb16(src2) → msb16(dst); lsb16(src1) + lsb16(src2) → lsb16(dst); } else nop Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 See Also ADD, ADDU, SUB2 Example ADD2 .
PAGE 127
Bitwise AND AND Bitwise AND AND AND (.unit) src1, src2, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 For operand type... src1 src2 dst src1 src2 dst 3 2 1 0 1 1 0 s p 1 Unit Opfield uint xuint uint .L1, .L2 111 1011 scst5 xuint uint .L1, .L2 111 1010 1 .S unit Opcode 31 Opcode map field used...
PAGE 128
AND Bitwise AND Pipeline Pipeline Stage E1 src1, src2 Read dst Written Unit in use Instruction Type Single-cycle Delay Slots 0 See Also OR, XOR Example 1 AND .L1X .L or .S A1,B1,A2 Before instruction Example 2 A1 F7A1 302Ah A1 F7A1 302Ah A2 xxxx xxxxh A2 02A0 2020h B1 02B6 E724h B1 02B6 E724h AND .
PAGE 129
B Branch Using a Displacement Branch Using a Displacement B B (.unit) label Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 7 creg z cst21 3 1 21 Description 6 5 4 3 2 1 0 0 0 1 0 0 s p 1 Opcode map field used... For operand type... cst21 scst21 1 Unit .S1, .S2 A 21-bit signed constant, cst21, is shifted left by 2 bits and is added to the address of the first instruction of the fetch packet that contains the branch instruction.
PAGE 130
B Branch Using a Displacement Pipeline Target Instruction Pipeline Stage E1 PS PW PR DP DC E1 Read Written Branch Taken Unit in use .S Instruction Type Branch Delay Slots 5 Example Table 3−13 gives the program counter values and actions for the following code example. 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0004 0008 000C 0010 0014 0018 001C 0020 B ADD || ADD LOOP: MPY || SUB MPY MPY SHR ADD .S1 LOOP .L1 A1, A2, A3 .L2 B1, B2, B3 .M1X A3, B3, A4 .D1 A5, A6, A6 .M1 A3, A6, A5 .
PAGE 131
B Branch Using a Register Branch Using a Register B B (.unit) src2 Syntax .unit = .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 creg z 0 0 3 1 Description 23 0 0 22 0 18 src2 17 0 0 0 0 5 13 12 11 0 x 0 6 5 4 3 2 1 0 0 1 1 0 1 1 0 0 0 s p 1 1 Opcode map field used... For operand type... Unit src2 xuint .S2 1 src2 is placed in the program fetch counter (PFC).
PAGE 132
B Branch Using a Register Pipeline Target Instruction Pipeline Stage E1 PS PW PR DP DC E1 src2 Read Written Branch Taken Unit in use .S2 Instruction Type Branch Delay Slots 5 Example Table 3−14 gives the program counter values and actions for the following code example. In this example, the B10 register holds the value 1000 000Ch. B10 1000 000Ch 1000 1000 1000 1000 1000 1000 1000 1000 1000 0000 0004 0008 000C 0010 0014 0018 001C 0020 B ADD || ADD MPY || SUB MPY MPY SHR ADD .S2 B10 .
PAGE 133
Branch Using an Interrupt Return Pointer B IRP Branch Using an Interrupt Return Pointer B IRP B (.unit) IRP Syntax .unit = .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 creg z dst 3 1 5 Description 22 0 0 1 1 18 17 0 0 0 0 0 13 12 11 0 x 0 6 5 4 3 2 1 0 0 0 0 1 1 1 0 0 0 s p 1 1 Opcode map field used... For operand type... Unit src2 xsint .S2 1 IRP is placed in the program fetch counter (PFC).
PAGE 134
B IRP Branch Using an Interrupt Return Pointer Pipeline Target Instruction Pipeline Stage E1 Read IRP PS PW PR DP DC E1 Written Branch Taken Unit in use .S2 Instruction Type Branch Delay Slots 5 Example Table 3−15 gives the program counter values and actions for the following code example. Given that an interrupt occurred at PC = 0000 1000 0000 0000 0000 0000 0000 0000 0000 0020 0024 0028 002C 0030 0034 0038 IRP = B ADD MPY NOP SHR ADD ADD 0000 1000 .S2 IRP .S1 A0, A2, A1 .
PAGE 135
B NRP Branch Using NMI Return Pointer Branch Using NMI Return Pointer B NRP B (.unit) NRP Syntax .unit = .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 creg z dst 3 1 5 Description 22 0 0 1 1 18 17 1 0 0 0 0 13 12 11 0 x 0 6 5 4 3 2 1 0 0 0 0 1 1 1 0 0 0 s p 1 1 Opcode map field used... For operand type... Unit src2 xsint .S2 1 NRP is placed in the program fetch counter (PFC). This instruction also sets the NMIE bit.
PAGE 136
B NRP Branch Using NMI Return Pointer Pipeline Target Instruction Pipeline Stage E1 PS PW PR DP DC E1 NRP Read Written Branch Taken Unit in use .S2 Instruction Type Branch Delay Slots 5 Example Table 3−16 gives the program counter values and actions for the following code example. Given that an interrupt occurred at PC = 0000 1000 0000 0000 0000 0000 0000 0000 0000 0020 0024 0028 002C 0030 0034 0038 NRP = B ADD MPY NOP SHR ADD ADD 0000 1000 .S2 NRP .S1 A0, A2, A1 .M1 A1, A0, A1 .
PAGE 137
CLR Clear a Bit Field Clear a Bit Field CLR CLR (.unit) src2, csta, cstb, dst or CLR (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant form 31 29 28 27 23 22 18 17 13 12 8 creg z dst src2 csta cstb 3 1 5 5 5 5 29 For operand type... src2 csta cstb dst uint ucst5 ucst5 uint 5 4 3 2 1 0 1 1 1 0 Unit .S1, .
PAGE 138
CLR Clear a Bit Field The field in src2, specified by csta and cstb, is cleared to zero. csta and cstb may be specified as constants or as the ten LSBs of the src1 registers, with cstb being bits 0−4 and csta bits 5−9. csta signifies the bit location of the LSB in the field and cstb signifies the bit location of the MSB in the field. In other words, csta and cstb represent the beginning and ending bits, respectively, of the field to be cleared.
PAGE 139
Clear a Bit Field Example 1 CLR .S1 A1,4,19,A2 Before instruction Example 2 A1 07A4 3F2Ah A2 xxxx xxxxh A2 07A0 000Ah B1,B3,B2 Before instruction SPRU733 1 cycle after instruction A1 07A4 3F2Ah CLR .
PAGE 140
CMPEQ Compare for Equality, Signed Integers Compare for Equality, Signed Integers CMPEQ CMPEQ (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield sint xsint uint .L1, .L2 101 0011 src1 src2 dst scst5 xsint uint .L1, .
PAGE 141
Compare for Equality, Signed Integers Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 See Also CMPEQDP, CMPEQSP, CMPGT, CMPLT Example 1 CMPEQ .L1X A1,B1,A2 Before instruction A1 0000 04B8h 1208 A2 xxxx xxxxh B1 0000 04B7h Example 2 CMPEQ .L1 A1 0000 04B8h 1207 false B1 0000 04B7h Ch,A1,A2 A1 0000 000Ch 12 A2 xxxx xxxxh 1 cycle after instruction A1 0000 000Ch A2 0000 0001h true CMPEQ .
PAGE 142
CMPEQDP Compare for Equality, Double-Precision Floating-Point Values Compare for Equality, Double-Precision Floating-Point Values CMPEQDP CMPEQDP (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z 27 dst 23 22 src2 18 17 src1 13 x 1 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst dp xdp sint 6 5 4 3 2 1 0 0 1 0 0 0 1 0 0 0 s p 1 1 Unit .S1, .S2 Description Compares src1 to src2.
PAGE 143
CMPEQDP Compare for Equality, Double-Precision Floating-Point Values Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those in the preceding table are set, except the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read E1 E2 src1_l src2_l src1_h src2_h dst Written .S Unit in use .S Instruction Type DP compare Delay Slots 1 Functional Unit Latency 2 See Also CMPEQ, CMPEQSP, CMPGTDP, CMPLTDP Example CMPEQDP .
PAGE 144
CMPEQSP Compare for Equality, Single-Precision Floating-Point Values Compare for Equality, Single-Precision Floating-Point Values CMPEQSP CMPEQSP (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z 27 dst 23 22 src2 18 17 src1 13 x 1 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst sp xsp sint 6 5 4 3 2 1 0 1 1 0 0 0 1 0 0 0 s p 1 1 Unit .S1, .S2 Description Compares src1 to src2.
PAGE 145
Compare for Equality, Single-Precision Floating-Point Values CMPEQSP Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those shown in the preceding table are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage E1 Read src1 src2 Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 Functional Unit Latency 1 See Also CMPEQ, CMPEQDP, CMPGTSP, CMPLTSP Example CMPEQSP .
PAGE 146
CMPGT Compare for Greater Than, Signed Integers Compare for Greater Than, Signed Integers CMPGT CMPGT (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 3-86 Instruction Set Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield sint xsint uint .L1, .L2 100 0111 src1 src2 dst scst5 xsint uint .
PAGE 147
Compare for Greater Than, Signed Integers Description CMPGT Performs a signed comparison of src1 to src2. If src1 is greater than src2, then a 1 is written to dst; otherwise, a 0 is written to dst. Note: The CMPGT instruction allows using a 5-bit constant as src1. If src2 is a 5-bit constant, as in CMPGT .L1 A4, 5, A0 Then to implement this operation, the assembler converts this instruction to CMPLT .
PAGE 148
CMPGT Example 1 Compare for Greater Than, Signed Integers CMPGT .L1X A1,B1,A2 Before instruction A1 0000 01B6h 438 A2 xxxx xxxxh B1 0000 08BDh Example 2 2237 −367 A2 xxxx xxxxh B1 FFFF FDC4h B1 0000 08BDh CMPGT .L1 1 cycle after instruction A1 FFFF FE91h A2 0000 0001h −572 true B1 FFFF FDC4h 8,A1,A2 Before instruction A1 0000 0023h 35 A2 xxxx xxxxh 1 cycle after instruction A1 0000 0023h A2 0000 0000h false CMPGT .
PAGE 149
Compare for Greater Than, Double-Precision Floating-Point Values CMPGTDP Compare for Greater Than, Double-Precision Floating-Point Values CMPGTDP CMPGTDP (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z 27 dst 23 22 src2 18 17 src1 13 x 1 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst dp xdp sint 6 5 4 3 2 1 0 0 1 0 0 1 1 0 0 0 s p 1 1 Unit .S1, .
PAGE 150
CMPGTDP Compare for Greater Than, Double-Precision Floating-Point Values (C67x CPU) Note: No configuration bits other than those shown above are set, except the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read E1 E2 src1_l src2_l src1_h src2_h dst Written Unit in use .S .S Instruction Type DP compare Delay Slots 1 Functional Unit Latency 2 See Also CMPEQDP, CMPGT, CMPGTSP, CMPGTU, CMPLTDP Example CMPGTDP .
PAGE 151
Compare for Greater Than, Single-Precision Floating-Point Values CMPGTSP Compare for Greater Than, Single-Precision Floating-Point Values CMPGTSP CMPGTSP (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z 27 dst 23 22 src2 18 17 src1 13 x 1 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst sp xsp sint 6 5 4 3 2 1 0 1 1 0 0 1 1 0 0 0 s p 1 1 Unit .S1, .
PAGE 152
CMPGTSP Compare for Greater Than, Single-Precision Floating-Point Values Note: No configuration bits other than those shown above are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage E1 Read src1 src2 Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 Functional Unit Latency 1 See Also CMPEQSP, CMPGT, CMPGTDP, CMPGTU, CMPLTSP Example CMPGTSP .S1X A1,B2,A3 Before instruction A1 C020 0000h −2.5 A1 C020 0000h −2.5 B2 4109 999Ah 8.
PAGE 153
CMPGTU Compare for Greater Than, Unsigned Integers Compare for Greater Than, Unsigned Integers CMPGTU CMPGTU (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield uint xuint uint .L1, .L2 100 1111 src1 src2 dst ucst4 xuint uint .L1, .
PAGE 154
CMPGTU Compare for Greater Than, Unsigned Integers Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 See Also CMPGT, CMPGTDP, CMPGTSP, CMPLTU Example 1 CMPGTU .L1 A1,A2,A3 Before instruction A1 0000 0128h 296† A1 0000 0128h A2 FFFF FFDEh 4294967262† A2 FFFF FFDEh A3 xxxx xxxxh † A3 0000 0000h CMPGTU .
PAGE 155
CMPLT Compare for Less Than, Signed Integers Compare for Less Than, Signed Integers CMPLT CMPLT (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 SPRU733 Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield sint xsint uint .L1, .L2 101 0111 src1 src2 dst scst5 xsint uint .L1, .
PAGE 156
CMPLT Compare for Less Than, Signed Integers Description Performs a signed comparison of src1 to src2. If src1 is less than src2, then 1 is written to dst; otherwise, 0 is written to dst. Note: The CMPLT instruction allows using a 5-bit constant as src1. If src2 is a 5-bit constant, as in CMPLT .L1 A4, 5, A0 Then to implement this operation, the assembler converts this instruction to CMPGT .
PAGE 157
Compare for Less Than, Signed Integers Example 1 CMPLT .L1 A1,A2,A3 Before instruction 2018 A1 0000 07E2h A2 0000 0F6Bh 3947 A2 0000 0F6Bh CMPLT .L1 A3 0000 0001h −298 A1 FFFF FED6h A2 0000 000Ch 12 A2 0000 000Ch A3 xxxx xxxxh A3 0000 0001h true 9,A1,A2 Before instruction A1 0000 0005h A2 xxxx xxxxh SPRU733 1 cycle after instruction A1 FFFF FED6h CMPLT .
PAGE 158
CMPLTDP Compare for Less Than, Double-Precision Floating-Point Values Compare for Less Than, Double-Precision Floating-Point Values CMPLTDP CMPLTDP (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z 27 dst 23 22 src2 18 17 src1 13 x 1 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst dp xdp sint 6 5 4 3 2 1 0 0 1 0 1 0 1 0 0 0 s p 1 1 Unit .S1, .S2 Description Compares src1 to src2.
PAGE 159
Compare for Less Than, Double-Precision Floating-Point Values CMPLTDP Note: No configuration bits other than those above are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read E1 E2 src1_l src2_l src1_h src2_h dst Written Unit in use .S .S Instruction Type DP compare Delay Slots 1 Functional Unit Latency 2 See Also CMPEQDP, CMPGTDP, CMPLT, CMPLTSP, CMPLTU Example CMPLTDP .
PAGE 160
CMPLTSP Compare for Less Than, Single-Precision Floating-Point Values Compare for Less Than, Single-Precision Floating-Point Values CMPLTSP CMPLTSP (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z 27 dst 23 22 src2 18 17 src1 13 x 1 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst sp xsp sint 6 5 4 3 2 1 0 1 1 0 1 0 1 0 0 0 s p 1 1 Unit .S1, .S2 Description Compares src1 to src2.
PAGE 161
Compare for Less Than, Single-Precision Floating-Point Values CMPLTSP Note: No configuration bits other than those above are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage E1 Read src1 src2 Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 Functional Unit Latency 1 See Also CMPEQSP, CMPGTSP, CMPLT, CMPLTDP, CMPLTU Example CMPLTSP .S1 A1,A2,A3 Before instruction A1 C020 0000h −2.5 A1 C020 0000h −2.5 A2 4109 999Ah 8.
PAGE 162
CMPLTU Compare for Less Than, Unsigned Integers Compare for Less Than, Unsigned Integers CMPLTU CMPLTU (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield uint xuint uint .L1, .L2 101 1111 src1 src2 dst ucst4 xuint uint .L1, .
PAGE 163
Compare for Less Than, Unsigned Integers Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 See Also CMPGTU, CMPLT, CMPLTDP, CMPLTSP Example 1 CMPLTU .L1 A1,A2,A3 Before instruction Example 2 10394† A1 0000 289Ah A2 FFFF F35Eh 4294964062† A2 FFFF F35Eh A3 0000 0001h CMPLTU .L1 14,A1,A2 A1 0000 000Fh 15† A2 xxxx xxxxh † A1 0000 000Fh true Unsigned 32-bit integer CMPLTU .
PAGE 164
DPINT Convert Double-Precision Floating-Point Value to Integer Convert Double-Precision Floating-Point Value to Integer DPINT DPINT (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 4 3 0 x 0 0 0 1 0 0 0 1 1 0 s p 1 Opcode map field used... For operand type... src2 dst dp sint 2 1 1 0 1 Unit .L1, .
PAGE 165
Convert Double-Precision Floating-Point Value to Integer Pipeline Pipeline Stage E1 E2 E3 dst Written .L Unit in use Instruction Type 4-cycle Delay Slots 3 Functional Unit Latency 1 See Also DPSP, DPTRUNC, INTDP, SPINT Example DPINT .L1 A1:A0,A4 Before instruction A4 xxxx xxxxh SPRU733 E4 src2_l src2_h Read A1:A0 4021 3333h DPINT 3333 3333h 4 cycles after instruction 8.6 A1:A0 4021 3333h 3333 3333h A4 0000 0009h 8.
PAGE 166
DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value DPSP DPSP (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 0 x 0 0 0 1 0 0 1 1 1 0 s p 4 3 2 1 Opcode map field used... For operand type... src2 dst dp sp 1 1 0 1 Unit .L1, .
PAGE 167
Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value DPSP 7) If underflow occurs, the INEX and UNDER bits are set and the results are set as follows (SPFN is the smallest floating-point number): Underflow Output Rounding Mode Pipeline Result Sign Nearest Even Zero +Infinity −Infinity + +0 +0 +SFPN +0 − −0 −0 −0 −SFPN Pipeline Stage E1 E2 E3 src2_l src2_h Read dst Written .
PAGE 168
DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Convert Double-Precision Floating-Point Value to Integer With Truncation DPTRUNC DPTRUNC (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 0 x 0 0 0 0 0 0 1 1 1 0 s p 4 3 1 Opcode map field used... For operand type... src2 dst dp sint 2 1 1 0 1 Unit .L1, .
PAGE 169
Convert Double-Precision Floating-Point Value to Integer With Truncation Pipeline Pipeline Stage E1 E2 DPTRUNC E3 src2_l src2_h Read dst Written .L Unit in use Instruction Type 4-cycle Delay Slots 3 Functional Unit Latency 1 See Also DPINT, DPSP, SPTRUNC Example DPTRUNC .L1 A1:A0,A4 Before instruction A1:A0 4021 3333h A4 xxxx xxxxh SPRU733 E4 3333 3333h 4 cycles after instruction 8.6 A1:A0 4021 3333h 3333 3333h A4 0000 0008h 8.
PAGE 170
EXT Extract and Sign-Extend a Bit Field Extract and Sign-Extend a Bit Field EXT EXT (.unit) src2, csta, cstb, dst or EXT (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant form 31 29 28 27 23 22 18 17 13 12 8 creg z dst src2 csta cstb 3 1 5 5 5 5 29 For operand type... src2 csta cstb dst sint ucst5 ucst5 sint 5 4 3 2 1 0 1 1 1 0 Unit .S1, .
PAGE 171
EXT Extract and Sign-Extend a Bit Field The field in src2, specified by csta and cstb, is extracted and sign-extended to 32 bits. The extract is performed by a shift left followed by a signed shift right. csta and cstb are the shift left amount and shift right amount, respectively. This can be thought of in terms of the LSB and MSB of the field to be extracted. Then csta = 31 − MSB of the field and cstb = csta + LSB of the field.
PAGE 172
EXT Extract and Sign-Extend a Bit Field Instruction Type Single-cycle Delay Slots 0 See Also EXTU Example 1 EXT .S1 A1,10,19,A2 Before instruction Example 2 A1 07A4 3F2Ah A1 07A4 3F2Ah A2 xxxx xxxxh A2 FFFF F21Fh EXT .
PAGE 173
Extract and Zero-Extend a Bit Field EXTU Extract and Zero-Extend a Bit Field EXTU EXTU (.unit) src2, csta, cstb, dst or EXTU (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant width and offset form: 31 29 28 27 23 22 18 17 13 12 8 creg z dst src2 csta cstb 3 1 5 5 5 5 29 For operand type... src2 csta cstb dst uint ucst5 ucst5 uint 5 4 3 2 1 0 1 1 1 0 Unit .S1, .
PAGE 174
EXTU Extract and Zero-Extend a Bit Field The field in src2, specified by csta and cstb, is extracted and zero extended to 32 bits. The extract is performed by a shift left followed by an unsigned shift right. csta and cstb are the amounts to shift left and shift right, respectively. This can be thought of in terms of the LSB and MSB of the field to be extracted. Then csta = 31 − MSB of the field and cstb = csta + LSB of the field.
PAGE 175
Extract and Zero-Extend a Bit Field Instruction Type Single-cycle Delay Slots 0 See Also EXT Example 1 EXTU .S1 A1,10,19,A2 Before instruction Example 2 A1 07A4 3F2Ah A2 xxxx xxxxh A2 0000 121Fh A1,A2,A3 Before instruction SPRU733 1 cycle after instruction A1 07A4 3F2Ah EXTU .
PAGE 176
IDLE Multicycle NOP With No Termination Until Interrupt Multicycle NOP With No Termination Until Interrupt IDLE Syntax IDLE .unit = none C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 18 Reserved 14 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 s p 1 0 1 Description Performs an infinite multicycle NOP that terminates upon servicing an interrupt, or a branch occurs due to an IDLE instruction being in the delay slots of a branch.
PAGE 177
INTDP Convert Signed Integer to Double-Precision Floating-Point Value Convert Signed Integer to Double-Precision Floating-Point Value INTDP INTDP (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 4 3 2 1 0 x 0 1 1 1 0 0 1 1 1 0 s p 1 1 Opcode map field used... For operand type... src2 dst xsint dp 0 1 Unit .L1, .
PAGE 178
INTDP Convert Signed Integer to Double-Precision Floating-Point Value Example INTDP .L1x Before instruction B4 1965 1127h A1:A0 xxxx xxxxh 3-118 Instruction Set 426053927 xxxx xxxxh B4,A1:A0 5 cycles after instruction B4 1965 1127h A1:A0 41B9 6511h 426053927 2700 0000h 4.
PAGE 179
Convert Unsigned Integer to Double-Precision Floating-Point Value INTDPU Convert Unsigned Integer to Double-Precision Floating-Point Value INTDPU INTDPU (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 4 3 2 1 0 x 0 1 1 1 0 1 1 1 1 0 s p 1 1 Opcode map field used... For operand type... src2 dst xuint dp 0 1 Unit .L1, .
PAGE 180
INTDPU Convert Unsigned Integer to Double-Precision Floating-Point Value Example INTDPU .L1 Before instruction A4,A1:A0 5 cycles after instruction A4 FFFF FFDEh 4294967262 A4 FFFF FFDEh 4294967262 A1:A0 xxxx xxxxh xxxx xxxxh A1:A0 41EF FFFFh FBC0 0000h 3-120 Instruction Set 4.
PAGE 181
INTSP Convert Signed Integer to Single-Precision Floating-Point Value Convert Signed Integer to Single-Precision Floating-Point Value INTSP INTSP (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 0 x 1 0 0 1 0 1 0 1 1 0 s p 4 3 2 1 1 1 Opcode map field used... For operand type... src2 dst xsint sp 0 1 Unit .L1, .
PAGE 182
INTSPU Convert Unsigned Integer to Single-Precision Floating-Point Value Convert Unsigned Integer to Single-Precision Floating-Point Value INTSPU INTSPU (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 0 x 1 0 0 1 0 0 1 1 1 0 s p 4 3 2 1 1 1 Opcode map field used... For operand type... src2 dst xuint sp 0 1 Unit .L1, .
PAGE 183
LDB(U) Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset LDB(U) Syntax Register Offset Unsigned Constant Offset LDB (.unit) *+baseR[offsetR], dst or LDBU (.unit) *+baseR[offsetR], dst LDB (.unit) *+baseR[ucst5], dst or LDBU (.unit) *+baseR[ucst5], dst .unit = .D1 or .
PAGE 184
LDB(U) Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset The addressing arithmetic that performs the additions and subtractions defaults to linear mode. However, for A4−A7 and for B4−B7, the mode can be changed to circular mode by writing the appropriate value to the AMR (see section 2.7.3, page 2-10). For LDB(U), the values are loaded into the 8 LSBs of dst. For LDB, the upper 24 bits of dst values are sign-extended; for LDBU, the upper 24 bits of dst are zero-filled.
PAGE 185
Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example LDB .
PAGE 186
LDB(U) Load Byte From Memory With a 15-Bit Unsigned Constant Offset Load Byte From Memory With a 15-Bit Unsigned Constant Offset LDB(U) LDB (.unit) *+B14/B15[ucst15], dst or LDBU (.unit) *+B14/B15[ucst15], dst Syntax .unit = .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 9 8 7 6 4 creg z dst ucst15 y op 3 1 5 15 1 3 Description 3 2 1 0 1 1 s p 1 1 Loads a byte from memory to a general-purpose register (dst).
PAGE 187
Load Byte From Memory With a 15-Bit Unsigned Constant Offset if (cond) else nop Execution LDB(U) mem → dst Note: This instruction executes only on the B side (.D2). Pipeline Pipeline Stage E1 E2 E3 E4 B14 / B15 Read dst Written .D2 Unit in use Instruction Type Load Delay Slots 4 See Also LDH, LDW Example LDB .
PAGE 188
LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset LDDW Syntax Register Offset Unsigned Constant Offset LDDW (.unit) *+baseR[offsetR], dst LDDW (.unit) *+baseR[ucst5], dst .unit = .D1 or .
PAGE 189
Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset LDDW Increments and decrements default to 1 and offsets default to 0 when no bracketed register, bracketed constant, or constant enclosed in parentheses is specified. Square brackets, [ ], indicate that ucst5 is left shifted by 3. Parentheses, ( ), indicate that ucst5 is not left shifted. In other words, parentheses indicate a byte offset rather than a doubleword offset.
PAGE 190
LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Delay Slots 4 Functional Unit Latency 1 See Also LDB, LDH, LDW Example 1 LDDW .D2 *+B10[1],A1:A0 Before instruction A1:A0 xxxx xxxxh 5 cycles after instruction xxxx xxxxh B10 0000 0010h mem 18h 3333 3333h A1:A0 4021 3333h 16 4021 3333h 8.6 3333 3333h B10 0000 0010h mem 18h 3333 3333h 16 4021 3333h 8.6 Little-endian mode Example 2 LDDW .
PAGE 191
LDH(U) Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset LDH(U) Syntax Register Offset Unsigned Constant Offset LDH (.unit) *+baseR[offsetR], dst or LDHU (.unit) *+baseR[offsetR], dst LDH (.unit) *+baseR[ucst5], dst or LDHU (.unit) *+baseR[ucst5], dst .unit = .D1 or .
PAGE 192
LDH(U) Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset The addressing arithmetic that performs the additions and subtractions defaults to linear mode. However, for A4−A7 and for B4−B7, the mode can be changed to circular mode by writing the appropriate value to the AMR (see section 2.7.3, page 2-10). For LDH(U), the values are loaded into the 16 LSBs of dst. For LDH, the upper 16 bits of dst are sign-extended; for LDHU, the upper 16 bits of dst are zerofilled.
PAGE 193
Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example LDH .
PAGE 194
LDH(U) Load Halfword From Memory With a 15-Bit Unsigned Constant Offset Load Halfword From Memory With a 15-Bit Unsigned Constant Offset LDH(U) LDH (.unit) *+B14/B15[ucst15], dst or LDHU (.unit) *+B14/B15[ucst15], dst Syntax .unit = .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 9 8 7 6 4 creg z dst ucst15 y op 3 1 5 15 1 3 Description 3 2 1 0 1 1 s p 1 1 Loads a halfword from memory to a general-purpose register (dst).
PAGE 195
Load Halfword From Memory With a 15-Bit Unsigned Constant Offset LDH(U) Table 3−20. Data Types Supported by LDH(U) Instruction (15-Bit Offset) Mnemonic Execution op Field Load Data Type SIze Left Shift of Offset LDH 1 0 0 Load halfword 16 1 bit LDHU 0 0 0 Load halfword unsigned 16 1 bit if (cond) else nop mem → dst Note: This instruction executes only on the B side (.D2).
PAGE 196
LDW Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset LDW Syntax Register Offset Unsigned Constant Offset LDW (.unit) *+baseR[offsetR], dst LDW (.unit) *+baseR[ucst5], dst .unit = .D1 or .
PAGE 197
Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset LDW Increments and decrements default to 1 and offsets default to 0 when no bracketed register or constant is specified. Loads that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 2. Parentheses, ( ), can be used to set a nonscaled, constant offset. For example, LDW (.unit) *+baseR (12) dst represents an offset of 12 bytes; whereas, LDW (.
PAGE 198
LDW Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 1 LDW .D1 Before LDW *A10,B1 1 cycle after LDW 5 cycles after LDW B1 0000 0000h B1 0000 0000h B1 21F3 1996h A10 0000 0100h A10 0000 0100h A10 0000 0100h mem 100h mem mem 100h 21F3 1996h Example 2 LDW .
PAGE 199
LDW Load Word From Memory With a 15-Bit Unsigned Constant Offset Load Word From Memory With a 15-Bit Unsigned Constant Offset LDW LDW (.unit) *+B14/B15[ucst15], dst Syntax .unit = .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 creg z dst ucst15 3 1 5 15 Description 12 9 8 7 6 4 3 2 1 0 y 1 1 0 1 1 s p 1 1 1 Load a word from memory to a general-purpose register (dst).
PAGE 200
LDW Load Word From Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read E1 Instruction Type Load Delay Slots 4 See Also LDB, LDH 3-140 Instruction Set E3 E4 E5 B14 / B15 dst Written Unit in use E2 .
PAGE 201
LMBD Leftmost Bit Detection Leftmost Bit Detection LMBD LMBD (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1/cst5 x op 3 1 5 5 5 1 7 Description Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 1 Unit Opfield uint xuint uint .L1, .L2 110 1011 cst5 xuint uint .L1, .
PAGE 202
LMBD Leftmost Bit Detection if (cond) Execution else nop Pipeline { if (src10 == 0) lmb0(src2) → dst if (src10 == 1) lmb1(src2) → dst } Pipeline Stage E1 src1, src2 Read Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 Example LMBD .
PAGE 203
MPY Multiply Signed 16 LSB x Signed 16 LSB Multiply Signed 16 LSB Signed 16 LSB MPY MPY (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 7 creg z dst src2 src1 x op 3 1 5 5 5 1 5 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 6 5 4 3 2 1 0 0 0 0 0 0 s p 1 Unit Opfield slsb16 xslsb16 sint .M1, .M2 11001 scst5 xslsb16 sint .M1, .
PAGE 204
MPY Multiply Signed 16 LSB x Signed 16 LSB Example 1 MPY .M1 A1,A2,A3 Before instruction A1 0000 0123h 291† A1 0000 0123h A2 01E0 FA81h −1407† A2 01E0 FA81h A3 xxxx xxxxh † Example 2 A3 FFF9 C0A3 13,A1,A2 Before instruction A1 3497 FFF3h A2 xxxx xxxxh 3-144 Instruction Set −409437 Signed 16-LSB integer MPY .
PAGE 205
MPYDP Multiply Two Double-Precision Floating-Point Values Multiply Two Double-Precision Floating-Point Values MPYDP MPYDP (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst dp dp dp 7 6 5 4 3 2 1 0 1 1 1 0 0 0 0 0 0 s p 1 1 Unit .M1, .
PAGE 206
MPYDP Multiply Two Double-Precision Floating-Point Values Pipeline Pipeline Stage Read E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 dst_l dst_h src1_l src1_l src1_h src1_h src2_l src2_h src2_l src2_h Written .M Unit in use .M .M .M If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
PAGE 207
MPYH Multiply Signed 16 MSB x Signed 16 MSB Multiply Signed 16 MSB Signed 16 MSB MPYH MPYH (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 0 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst smsb16 xsmsb16 sint 1 1 Unit .M1, .M2 Description The src1 operand is multiplied by the src2 operand.
PAGE 208
MPYH Multiply Signed 16 MSB x Signed 16 MSB Example MPYH .
PAGE 209
MPYHL Multiply Signed 16 MSB x Signed 16 LSB Multiply Signed 16 MSB Signed 16 LSB MPYHL MPYHL (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 0 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst smsb16 xslsb16 sint 1 1 Unit .M1, .
PAGE 210
MPYHL Multiply Signed 16 MSB x Signed 16 LSB Example MPYHL .
PAGE 211
MPYHLU Multiply Unsigned 16 MSB x Unsigned 16 LSB Multiply Unsigned 16 MSB Unsigned 16 LSB MPYHLU MPYHLU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 1 1 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst umsb16 xulsb16 uint 1 1 Unit .M1, .
PAGE 212
MPYHSLU Multiply Signed 16 MSB x Unsigned 16 LSB Multiply Signed 16 MSB Unsigned 16 LSB MPYHSLU MPYHSLU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 0 1 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst smsb16 xulsb16 sint 1 1 Unit .M1, .
PAGE 213
MPYHSU Multiply Signed 16 MSB x Unsigned 16 MSB Multiply Signed 16 MSB Unsigned 16 MSB MPYHSU MPYHSU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 0 1 1 0 0 0 0 0 s p 1 Opcode map field used... For operand type... src1 src2 dst smsb16 xumsb16 sint 1 Unit .M1, .
PAGE 214
MPYHU Multiply Unsigned 16 MSB x Unsigned 16 MSB Multiply Unsigned 16 MSB Unsigned 16 MSB MPYHU MPYHU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 1 1 1 0 0 0 0 0 s p 1 Opcode map field used... For operand type... src1 src2 dst umsb16 xumsb16 uint 1 Unit .M1, .
PAGE 215
Multiply Unsigned 16 MSB x Signed 16 LSB MPYHULS Multiply Unsigned 16 MSB Signed 16 LSB MPYHULS MPYHULS (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 1 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst umsb16 xslsb16 sint 1 1 Unit .M1, .
PAGE 216
MPYHUS Multiply Unsigned 16 MSB x Signed 16 MSB Multiply Unsigned 16 MSB Signed 16 MSB MPYHUS MPYHUS (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 1 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst umsb16 xsmsb16 sint 1 1 Unit .M1, .
PAGE 217
MPYI Multiply 32-Bit x 32-Bit Into 32-Bit Result Multiply 32-Bit 32-Bit Into 32-Bit Result MPYI MPYI (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 7 creg z dst src2 src1 x op 3 1 5 5 5 1 5 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 6 5 4 3 2 1 0 0 0 0 0 0 s p 1 Unit Opfield sint xsint sint .M1, .M2 00100 cst5 xsint sint .M1, .
PAGE 218
MPYI Multiply 32-Bit x 32-Bit Into 32-Bit Result Functional Unit Latency 4 See Also MPYID Example MPYI .
PAGE 219
MPYID Multiply 32-Bit x 32-Bit Into 64-Bit Result Multiply 32-Bit 32-Bit Into 64-Bit Result MPYID MPYID (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 7 creg z dst src2 src1 x op 3 1 5 5 5 1 5 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 6 5 4 3 2 1 0 0 0 0 0 0 s p 1 Unit Opfield sint xsint sdint .M1, .M2 01000 cst5 xsint sdint .M1, .
PAGE 220
MPYID Multiply 32-Bit x 32-Bit Into 64-Bit Result Functional Unit Latency 4 See Also MPYI Example MPYID .
PAGE 221
MPYLH Multiply Signed 16 LSB x Signed 16 MSB Multiply Signed 16 LSB Signed 16 MSB MPYLH MPYLH (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 0 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst slsb16 xsmsb16 sint 1 1 Unit .M1, .
PAGE 222
MPYLH Multiply Signed 16 LSB x Signed 16 MSB Example MPYLH .
PAGE 223
MPYLHU Multiply Unsigned 16 LSB x Unsigned 16 MSB Multiply Unsigned 16 LSB Unsigned 16 MSB MPYLHU MPYLHU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 1 1 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst ulsb16 xumsb16 uint 1 1 Unit .M1, .
PAGE 224
MPYLSHU Multiply Signed 16 LSB x Unsigned 16 MSB Multiply Signed 16 LSB Unsigned 16 MSB MPYLSHU MPYLSHU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 0 1 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst slsb16 xumsb16 sint 1 1 Unit .M1, .
PAGE 225
Multiply Unsigned 16 LSB x Signed 16 MSB MPYLUHS Multiply Unsigned 16 LSB Signed 16 MSB MPYLUHS MPYLUHS (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 1 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst ulsb16 xsmsb16 sint 1 1 Unit .M1, .
PAGE 226
MPYSP Multiply Two Single-Precision Floating-Point Values Multiply Two Single-Precision Floating-Point Values MPYSP MPYSP (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst sp xsp sp 7 6 5 4 3 2 1 0 1 1 0 0 0 0 0 0 0 s p 1 1 Unit .M1, .
PAGE 227
MPYSP Multiply Two Single-Precision Floating-Point Values Pipeline Pipeline Stage E1 E2 E3 E4 src1 src2 Read dst Written Unit in use .M If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
PAGE 228
MPYSPDP Multiply Single-Precision Value x Double-Precision Value (C67x+ CPU) Multiply Single-Precision Floating-Point Value Double-Precision Floating-Point Value MPYSPDP MPYSPDP (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C67x+ CPU only Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 Opcode map field used... For operand type... src1 src2 dst sp xsp sp 7 6 5 4 3 2 1 0 1 0 1 1 0 1 1 0 0 s p 1 1 Unit .M1, .
PAGE 229
Multiply Single-Precision Value x Double-Precision Value (C67x+ CPU) Pipeline Pipeline Stage Read E1 E2 src1 src2_l src1 src2_h E3 Written Unit in use .M E4 E5 MPYSPDP E6 E7 dst_l dst_h .M The low half of the result is written out one cycle earlier than the high half.
PAGE 230
MPYSP2DP Multiply Two Single-Precision Floating-Point Values for Double-Precision Result (C67x+ CPU) Multiply Two Single-Precision Floating-Point Values for Double-Precision Result MPYSP2DP MPYSP2DP (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C67x+ CPU only Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 Opcode map field used... For operand type...
PAGE 231
Multiply Two Single-Precision Floating-Point Values for Double-Precision Result (C67x+ CPU) Pipeline Pipeline Stage Read E1 E2 E4 E5 dst_l dst_h src1 src2 Written Unit in use E3 MPYSP2DP .M The low half of the result is written out one cycle earlier than the high half.
PAGE 232
MPYSU Multiply Signed 16 LSB x Unsigned 16 LSB Multiply Signed 16 LSB Unsigned 16 LSB MPYSU MPYSU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 7 creg z dst src2 src1 x op 3 1 5 5 5 1 5 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 6 5 4 3 2 1 0 0 0 0 0 0 s p 1 Unit Opfield slsb16 xulsb16 sint .M1, .M2 11011 scst5 xulsb16 sint .M1, .
PAGE 233
Multiply Signed 16 LSB x Unsigned 16 LSB See Also MPY, MPYU, MPYUS Example MPYSU .
PAGE 234
MPYU Multiply Unsigned 16 LSB x Unsigned 16 LSB Multiply Unsigned 16 LSB Unsigned 16 LSB MPYU MPYU (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 1 1 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst ulsb16 xulsb16 uint 1 1 Unit .M1, .
PAGE 235
Multiply Unsigned 16 LSB x Unsigned 16 LSB Example MPYU .
PAGE 236
MPYUS Multiply Unsigned 16 LSB x Signed 16 LSB Multiply Unsigned 16 LSB Signed 16 LSB MPYUS MPYUS (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 1 0 1 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst ulsb16 xslsb16 sint 1 1 Unit .M1, .
PAGE 237
Multiply Unsigned 16 LSB x Signed 16 LSB Example MPYUS .
PAGE 238
MV Move From Register to Register Move From Register to Register MV MV (.unit) src2, dst Syntax .unit = .L1, .L2, .S1, .S2, .D1, .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 0 x op 1 7 Opcode map field used... For operand type... 29 3 2 1 0 1 1 0 s p 1 Unit Opfield src2 dst xsint sint .L1, .L2 000 0010 src2 dst slong slong .L1, .
PAGE 239
Move From Register to Register .D unit Opcode 31 29 MV 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 0 0 7 1 5 4 3 2 1 1 Opcode map field used... For operand type... src2 dst 6 sint sint 0 0 0 1 0 1 0 0 0 0 s p 1 Unit .D1, .D2 Description The MV pseudo-operation moves a value from one register to another. The assembler uses the operation ADD (.unit) 0, src2, dst to perform this task.
PAGE 240
MVC Move Between Control File and Register File Move Between Control File and Register File MVC MVC (.unit) src2, dst Syntax .unit = .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 6 0 x op 1 6 5 4 3 2 1 0 1 0 0 0 s p 1 1 Operands when moving from the control file to the register file: Description Opcode map field used... For operand type... Unit Opfield src2 dst uint uint .
PAGE 241
Move Between Control File and Register File Execution if (cond) else nop MVC src2 → dst Note: The MVC instruction executes only on the B side (.S2). Refer to the individual control register descriptions for specific behaviors and restrictions in accesses via the MVC instruction. Pipeline Instruction Type Pipeline Stage E1 Read src2 Written dst Unit in use .
PAGE 242
MVC Move Between Control File and Register File Table 3−21.
PAGE 243
Move Signed Constant Into Register and Sign Extend MVK Move Signed Constant Into Register and Sign Extend MVK MVK (.unit) cst, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 7 creg z dst cst16 3 1 5 16 Description Opcode map field used... For operand type... cst16 dst scst16 sint 6 5 4 3 2 1 0 0 1 0 1 0 s p 1 1 Unit .S1, .S2 The 16-bit signed constant, cst, is sign extended and placed in dst.
PAGE 244
MVK Move Signed Constant Into Register and Sign Extend Instruction Type Single cycle Delay Slots 0 See Also MVKH, MVKL, MVKLH Example 1 MVK .L2 −5,B8 Before instruction B8 Example 2 xxxx xxxxh MVK .
PAGE 245
MVKH/MVKLH Move 16-Bit Constant Into Upper Bits of Register Move 16-Bit Constant Into Upper Bits of Register MVKH/MVKLH MVKH (.unit) cst, dst or MVKLH (.unit) cst, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 7 creg z dst cst16 3 1 5 16 Opcode map field used... For operand type... cst16 dst uscst16 sint 6 5 4 3 2 1 0 1 1 0 1 0 s p 1 1 Unit .S1, .
PAGE 246
MVKH/MVKLH Move 16-Bit Constant Into Upper Bits of Register Instruction Type Single-cycle Delay Slots 0 Note: Use the MVK instruction (page 3-183) to load 16-bit constants. The assembler generates a warning for any constant over 16 bits. To load 32-bit constants, such as 1234 5678h, use the following pair of instructions: MVKL MVKH 0x12345678 0x12345678 If you are loading the address of a label, use: MVKL MVKH Example 1 MVKH .
PAGE 247
MVKL Move Signed Constant Into Register and Sign Extend−Used with MVKH Move Signed Constant Into Register and Sign Extend MVKL MVKL (.unit) cst, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 7 creg z dst cst16 3 1 5 16 Description Opcode map field used... For operand type... cst16 dst scst16 sint 6 5 4 3 2 1 0 0 1 0 1 0 s p 1 1 Unit .S1, .
PAGE 248
MVKL Move Signed Constant Into Register and Sign Extend−Used with MVKH Pipeline Pipeline Stage E1 Read Written dst Unit in use .S Instruction Type Single cycle Delay Slots 0 See Also MVK, MVKH, MVKLH Example 1 MVKL .S1 5678h,A8 Before instruction A8 Example 2 xxxx xxxxh MVKL .
PAGE 249
NEG Negate Negate NEG NEG (.unit) src2, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .S unit 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 0 x 0 src2 dst 4 3 2 1 xsint sint 0 1 0 1 1 0 1 0 0 0 s p 1 1 1 0 Unit .S1, .S2 .L unit Opcode 29 5 1 Opcode map field used... For operand type...
PAGE 250
NOP No Operation No Operation NOP NOP [count] Syntax .unit = none C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 18 Reserved 17 16 0 src 14 Description 13 12 11 10 9 8 7 6 5 4 0 0 0 0 0 0 0 0 0 0 0 0 p 4 3 2 1 0 1 Opcode map field used... For operand type... Unit src ucst4 none src is encoded as count − 1. For src + 1 cycles, no operation is performed. The maximum value for count is 9. NOP with no operand is treated like NOP 1 with src encoded as 0000.
PAGE 251
No Operation Example 1 NOP MVK .S1 125h,A1 Before NOP A1 1234 5678h Example 2 MVK MVKLH NOP ADD .S1 .S1 5 .
PAGE 252
NORM Normalize Integer Normalize Integer NORM NORM (.unit) src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 0 x op 1 7 Opcode map field used... For operand type... src2 dst src2 dst 5 4 3 2 1 0 1 1 0 s p 1 1 Unit Opfield xsint uint .L1, .L2 110 0011 slong uint .L1, .L2 110 0000 The number of redundant sign bits of src2 is placed in dst.
PAGE 253
Normalize Integer Execution Pipeline if (cond) else nop norm(src) → dst Pipeline Stage E1 Read src2 Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 Example 1 NORM .L1 A1,A2 Before instruction Example 2 A1 02A3 469Fh A2 xxxx xxxxh A2 0000 0005h 5 A1,A2 Before instruction SPRU733 1 cycle after instruction A1 02A3 469Fh NORM .
PAGE 254
NOT Bitwise NOT Bitwise NOT NOT NOT (.unit) src2, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 1 1 1 1 13 12 11 1 x 1 5 1 4 3 2 For operand type... src2 dst xuint uint 0 0 1 1 1 0 1 1 0 s p 1 Opcode map field used... 1 1 1 1 0 Unit .L1, .L2 .
PAGE 255
OR Bitwise OR Bitwise OR OR OR (.unit) src1, src2, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 For operand type... src1 src2 dst src1 src2 dst 29 2 1 0 1 1 0 s p 1 Unit Opfield uint xuint uint .L1, .L2 111 1111 scst5 xuint uint .L1, .
PAGE 256
OR Bitwise OR if (cond) else nop Execution Pipeline src1 OR src2 → dst Pipeline Stage Read E1 src1, src2 dst Written .L or .S Unit in use Instruction Type Single-cycle Delay Slots 0 See Also AND, XOR Example 1 OR .S1 A3,A4,A5 Before instruction Example 2 1 cycle after instruction A3 08A3 A49Fh A3 08A3 A49Fh A4 00FF 375Ah A4 00FF 375Ah A5 xxxx xxxxh A5 08FF B7DFh OR .
PAGE 257
RCPDP Double-Precision Floating-Point Reciprocal Approximation Double-Precision Floating-Point Reciprocal Approximation RCPDP RCPDP (.unit) src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 reserved x 3 1 5 5 5 1 Description Opcode map field used... For operand type... src2 dst dp dp 6 5 4 3 2 1 0 0 1 1 0 1 1 0 0 0 s p 1 1 Unit .S1, .
PAGE 258
RCPDP Double-Precision Floating-Point Reciprocal Approximation Note: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set. 4) If src2 is signed 0, signed infinity is placed in dst and the DIV0 and INFO bits are set. 5) If src2 is signed infinity, signed 0 is placed in dst.
PAGE 259
RCPSP Single-Precision Floating-Point Reciprocal Approximation Single-Precision Floating-Point Reciprocal Approximation RCPSP RCPSP (.unit) src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 Description 17 0 0 0 0 13 12 11 0 x 1 6 5 4 3 For operand type... src2 dst xsp sp 1 0 1 1 1 0 1 1 0 0 0 s p 1 Opcode map field used... 2 1 1 Unit .S1, .
PAGE 260
RCPSP Single-Precision Floating-Point Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set. 4) If src2 is signed 0, signed infinity is placed in dst and the DIV0 and INFO bits are set. 5) If src2 is signed infinity, signed 0 is placed in dst.
PAGE 261
RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Double-Precision Floating-Point Square-Root Reciprocal Approximation RSQRDP RSQRDP (.unit) src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 reserved x 3 1 5 5 5 1 Description Opcode map field used... For operand type... src2 dst dp dp 6 5 4 3 2 1 0 0 1 1 1 0 1 0 0 0 s p 1 1 Unit .S1, .
PAGE 262
RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set. 4) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INEX, and DEN2 bits are set.
PAGE 263
RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Single-Precision Floating-Point Square-Root Reciprocal Approximation RSQRSP RSQRSP (.unit) src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 Description 17 0 0 0 0 13 12 11 0 x 1 6 5 4 3 For operand type... src2 dst xsp sp 1 0 1 1 1 1 0 1 0 0 0 s p 1 Opcode map field used... 2 1 1 Unit .S1, .
PAGE 264
RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Note: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set. 4) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INEX, and DEN2 bits are set.
PAGE 265
Add Two Signed Integers With Saturation SADD Add Two Signed Integers With Saturation SADD SADD (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Description Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield sint xsint sint .L1, .L2 001 0011 src1 src2 dst xsint slong slong .L1, .
PAGE 266
SADD Add Two Signed Integers With Saturation Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 See Also ADD, SSUB Example 1 SADD .L1 A1,A2,A3 Before instruction 1 cycle after instruction 2 cycles after instruction A1 5A2E 51A3h 1512984995 A1 5A2E 51A3h A1 5A2E 51A3h A2 012A 3FA2h 19546018 A2 012A 3FA2h A2 012A 3FA2h A3 xxxx xxxxh A3 5B58 9145h A3 5B58 9145h CSR 0001 0100h CSR Example 2 SADD .
PAGE 267
Add Two Signed Integers With Saturation Example 3 SADD .
PAGE 268
SAT Saturate a 40-Bit Integer to a 32-Bit Integer Saturate a 40-Bit Integer to a 32-Bit Integer SAT SAT (.unit) src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 4 0 x 1 0 0 0 0 0 0 1 1 0 s p 1 Opcode map field used... For operand type... src2 dst slong sint 3 2 1 1 0 1 Unit .L1, .L2 Description A 40-bit src2 value is converted to a 32-bit value.
PAGE 269
Saturate a 40-Bit Integer to a 32-Bit Integer Example 1 SAT .L2 B1:B0,B5 Before instruction B1:B0 0000 001Fh 3413 539Ah 1 cycle after instruction B1:B0 0000 001Fh 3413 539Ah 2 cycles after instruction B1:B0 0000 001Fh B5 xxxx xxxxh B5 7FFF FFFFh B5 7FFF FFFFh CSR 0001 0100h CSR 0001 0100h CSR 0001 0300h Example 2 SAT .
PAGE 270
SET Set a Bit Field Set a Bit Field SET SET (.unit) src2, csta, cstb, dst or SET (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant form: 31 29 28 27 23 22 18 17 13 12 8 creg z dst src2 csta cstb 3 1 5 5 5 5 29 For operand type... src2 csta cstb dst uint ucst5 ucst5 uint 5 4 3 2 1 0 1 1 1 0 Unit .S1, .
PAGE 271
SET Set a Bit Field The field in src2, specified by csta and cstb, is set to all 1s. The csta and cstb operands may be specified as constants or in the ten LSBs of the src1 register, with cstb being bits 0−4 and csta bits 5−9. csta signifies the bit location of the LSB of the field and cstb signifies the bit location of the MSB of the field. In other words, csta and cstb represent the beginning and ending bits, respectively, of the field to be set to all 1s.
PAGE 272
SET Set a Bit Field Example 1 SET .S1 A0,7,21,A1 Before instruction Example 2 A0 4B13 4A1Eh A0 4B13 4A1Eh A1 xxxx xxxxh A1 4B3F FF9Eh SET .
PAGE 273
SHL Arithmetic Shift Left Arithmetic Shift Left SHL SHL (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 Description Opcode map field used... For operand type... src2 src1 dst 5 4 3 2 1 0 1 0 0 0 s p 1 Unit Opfield xsint uint sint .S1, .S2 11 0011 src2 src1 dst slong uint slong .S1, .S2 11 0001 src2 src1 dst xuint uint ulong .
PAGE 274
SHL Arithmetic Shift Left Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 See Also SHR, SSHL Example 1 SHL .S1 A0,4,A1 Before instruction Example 2 A0 29E3 D31Ch A0 29E3 D31Ch A1 xxxx xxxxh A1 9E3D 31C0h SHL .S2 B0,B1,B2 Before instruction Example 3 B0 4197 51A5h B1 0000 0009h B1 0000 0009h B2 xxxx xxxxh B2 2EA3 4A00h SHL .
PAGE 275
SHR Arithmetic Shift Right Arithmetic Shift Right SHR SHR (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 Description Opcode map field used... For operand type... src2 src1 dst 5 4 3 2 1 0 1 0 0 0 s p 1 Unit Opfield xsint uint sint .S1, .S2 11 0111 src2 src1 dst slong uint slong .S1, .
PAGE 276
SHR Arithmetic Shift Right Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 See Also SHL, SHRU Example 1 SHR .S1 A0,8,A1 Before instruction Example 2 A0 F123 63D1h A0 F123 63D1h A1 xxxx xxxxh A1 FFF1 2363h SHR .S2 B0,B1,B2 Before instruction Example 3 B0 1492 5A41h B1 0000 0012h B1 0000 0012h B2 xxxx xxxxh B2 0000 0524h SHR .
PAGE 277
Logical Shift Right SHRU Logical Shift Right SHRU SHRU (.unit) src2, src1, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 Description Opcode map field used... For operand type... src2 src1 dst 5 4 3 2 1 0 1 0 0 0 s p 1 Unit Opfield xuint uint uint .S1, .S2 10 0111 src2 src1 dst ulong uint ulong .S1, .S2 10 0101 src2 src1 dst xuint ucst5 uint .
PAGE 278
SHRU Logical Shift Right Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 See Also SHL, SHR Example SHRU .
PAGE 279
Multiply Signed 16 LSB x Signed 16 LSB With Left Shift and Saturation SMPY Multiply Signed 16 LSB Signed 16 LSB With Left Shift and Saturation SMPY SMPY (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 0 1 0 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst slsb16 xslsb16 sint 1 1 Unit .M1, .
PAGE 280
SMPY Multiply Signed 16 LSB x Signed 16 LSB With Left Shift and Saturation Example SMPY .
PAGE 281
SMPYH Multiply Signed 16 MSB x Signed 16 MSB With Left Shift and Saturation Multiply Signed 16 MSB Signed 16 MSB With Left Shift and Saturation SMPYH SMPYH (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 0 1 0 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst smsb16 xsmsb16 sint 1 1 Unit .M1, .
PAGE 282
SMPYHL Multiply Signed 16 MSB x Signed 16 LSB With Left Shift and Saturation Multiply Signed 16 MSB Signed 16 LSB With Left Shift and Saturation SMPYHL SMPYHL (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 1 0 1 0 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst smsb16 xslsb16 sint 1 1 Unit .M1, .
PAGE 283
Multiply Signed 16 MSB x Signed 16 LSB With Left Shift and Saturation Example SMPYHL .
PAGE 284
SMPYLH Multiply Signed 16 LSB x Signed 16 MSB With Left Shift and Saturation Multiply Signed 16 LSB Signed 16 MSB With Left Shift and Saturation SMPYLH SMPYLH (.unit) src1, src2, dst Syntax .unit = .M1 or .M2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 1 creg z dst src2 src1 x 3 1 5 5 5 1 7 6 5 4 3 2 1 0 0 0 1 0 0 0 0 0 0 s p Opcode map field used... For operand type... src1 src2 dst slsb16 xsmsb16 sint 1 1 Unit .M1, .
PAGE 285
Multiply Signed 16 LSB x Signed 16 MSB With Left Shift and Saturation Example SMPYLH .
PAGE 286
SPDP Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value SPDP SPDP (.unit) src2, dst Syntax .unit = .S1 or .S2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 0 x 0 6 5 4 3 2 For operand type... src2 dst xsp dp 0 0 0 0 1 0 1 0 0 0 s p 1 Opcode map field used... 1 1 1 Unit .S1, .
PAGE 287
Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value Pipeline Pipeline Stage E1 Read src2 Written dst_l SPDP E2 dst_h .S Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
PAGE 288
SPINT Convert Single-Precision Floating-Point Value to Integer Convert Single-Precision Floating-Point Value to Integer SPINT SPINT (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 4 3 0 x 0 0 0 1 0 1 0 1 1 0 s p 1 Opcode map field used... For operand type... src2 dst xsp sint 2 1 1 0 1 Unit .L1, .
PAGE 289
SPINT Convert Single-Precision Floating-Point Value to Integer Pipeline Pipeline Stage E1 E2 E3 E4 src2 Read dst Written .L Unit in use Instruction Type 4-cycle Delay Slots 3 Functional Unit Latency 1 See Also DPINT, INTSP, SPDP, SPTRUNC Example SPINT .L1 A1,A2 Before instruction SPRU733 4 cycles after instruction A1 4109 9999Ah 8.6 A1 4109 999Ah 8.
PAGE 290
SPTRUNC Convert Single-Precision Floating-Point Value to Integer With Truncation Convert Single-Precision Floating-Point Value to Integer With Truncation SPTRUNC SPTRUNC (.unit) src2, dst Syntax .unit = .L1 or .L2 C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 creg z dst src2 3 1 5 5 17 0 0 0 0 13 12 11 5 4 3 0 x 0 0 0 1 0 1 1 1 1 0 s p 1 Opcode map field used... For operand type... src2 dst xsp sint 2 1 1 0 1 Unit .L1, .
PAGE 291
Convert Single-Precision Floating-Point Value to Integer With Truncation Pipeline Pipeline Stage E1 E2 SPTRUNC E3 E4 src2 Read dst Written Unit in use .L Instruction Type 4-cycle Delay Slots 3 Functional Unit Latency 1 See Also DPTRUNC, SPDP, SPINT Example SPTRUNC .L1X B1,A2 Before instruction SPRU733 4 cycles after instruction B1 4109 9999Ah 8.6 B1 4109 999Ah 8.
PAGE 292
SSHL Shift Left With Saturation SSHL Shift Left With Saturation Syntax SSHL (.unit) src2, src1, dst .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 5 4 3 2 1 0 1 0 0 0 s p 1 Unit Opfield xsint uint sint .S1, .S2 10 0011 xsint ucst5 sint .S1, .
PAGE 293
Shift Left With Saturation Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 See Also SHL, SHR Example 1 SSHL .S1 Before instruction A0,2,A1 1 cycle after instruction 2 cycles after instruction A0 02E3 031Ch A0 02E3 031Ch A0 02E3 031Ch A1 xxxx xxxxh A1 0B8C 0C70h A1 0B8C 0C70h CSR 0001 0100h CSR 0001 0100h CSR 0001 0100h Example 2 SSHL .
PAGE 294
SSUB Subtract Two Signed Integers With Saturation Subtract Two Signed Integers With Saturation SSUB SSUB (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Description Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield sint xsint sint .L1, .L2 000 1111 src1 src2 dst xsint sint sint .L1, .
PAGE 295
Subtract Two Signed Integers With Saturation Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .L Instruction Type Single-cycle Delay Slots 0 See Also SUB Example 1 SSUB .
PAGE 296
STB Store Byte to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Store Byte to Memory With a 5-Bit Unsigned Constant Offset or Register Offset STB Syntax Register Offset Unsigned Constant Offset STB (.unit) src, *+baseR[offsetR] STB (.unit) src, *+baseR[ucst5] .unit = .D1 or .
PAGE 297
Store Byte to Memory With a 5-Bit Unsigned Constant Offset or Register Offset STB Increments and decrements default to 1 and offsets default to zero when no bracketed register or constant is specified. Stores that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 0. Parentheses, ( ), can be used to set a nonscaled, constant offset.
PAGE 298
STB Store Byte to Memory With a 15-Bit Unsigned Constant Offset Store Byte to Memory With a 15-Bit Unsigned Constant Offset STB STB (.unit) src, *+B14/B15[ucst15] Syntax .unit = .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 8 creg z src ucst15 3 1 5 15 Description 7 6 4 3 2 1 0 y 0 1 1 1 1 s p 1 1 1 Stores a byte to memory from a general-purpose register (src).
PAGE 299
Store Byte to Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage STB E1 B14/B15, src Read Written .D2 Unit in use Instruction Type Store Delay Slots 0 See Also STH, STW Example STB .
PAGE 300
STH Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset STH Syntax Register Offset Unsigned Constant Offset STH (.unit) src, *+baseR[offsetR] STH (.unit) src, *+baseR[ucst5] .unit = .D1 or .
PAGE 301
Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset STH Increments and decrements default to 1 and offsets default to zero when no bracketed register or constant is specified. Stores that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 1. Parentheses, ( ), can be used to set a nonscaled, constant offset.
PAGE 302
STH Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 2 STH .
PAGE 303
STH Store Halfword to Memory With a 15-Bit Unsigned Constant Offset Store Halfword to Memory With a 15-Bit Unsigned Constant Offset STH STH (.unit) src, *+B14/B15[ucst15] Syntax .unit = .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 8 creg z src ucst15 3 1 5 15 Description 7 6 4 3 2 1 0 y 1 0 1 1 1 s p 1 1 1 Stores a halfword to memory from a general-purpose register (src).
PAGE 304
STH Store Halfword to Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read E1 B14/B15, src Written Unit in use Instruction Type Store Delay Slots 0 See Also STB, STW 3-244 Instruction Set .
PAGE 305
STW Store Word to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Store Word to Memory With a 5-Bit Unsigned Constant Offset or Register Offset STW Syntax Register Offset Unsigned Constant Offset STW (.unit) src, *+baseR[offsetR] STW (.unit) src, *+baseR[ucst5] .unit = .D1 or .
PAGE 306
STW Store Word to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to zero when no bracketed register or constant is specified. Stores that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 2. Parentheses, ( ), can be used to set a nonscaled, constant offset. For example, STW (.unit) src, *+baseR(12) represents an offset of 12 bytes; whereas, STW (.
PAGE 307
STW Store Word to Memory With a 15-Bit Unsigned Constant Offset Store Word to Memory With a 15-Bit Unsigned Constant Offset STW STW (.unit) src, *+B14/B15[ucst15] Syntax .unit = .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 8 creg z src ucst15 3 1 5 15 Description 7 6 4 3 2 1 0 y 1 1 1 1 1 s p 1 1 1 Stores a word to memory from a general-purpose register (src).
PAGE 308
STW Store Word to Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read E1 B14/B15, src Written Unit in use Instruction Type Store Delay Slots 0 See Also STB, STH 3-248 Instruction Set .
PAGE 309
SUB Subtract Two Signed Integers Without Saturation Subtract Two Signed Integers Without Saturation SUB SUB (.unit) src1, src2, dst or SUB (.D1 or .D2) src2, src1, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 SPRU733 Opcode map field used... For operand type... src1 src2 dst 4 3 2 1 0 1 1 0 s p 1 Unit Opfield sint xsint sint .L1, .
PAGE 310
SUB Subtract Two Signed Integers Without Saturation .S unit Opcode 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst 5 4 3 2 1 0 1 0 0 0 s p 1 Unit Opfield sint xsint sint .S1, .S2 01 0111 scst5 xsint sint .S1, .S2 01 0110 1 Description for .L1, .L2 and .S1, .S2 Opcodes src2 is subtracted from src1. The result is placed in dst. Execution for .L1, .L2 and .S1, .
PAGE 311
SUB Subtract Two Signed Integers Without Saturation .D unit Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 01 0001 sint ucst5 sint .D1, .D2 01 0011 1 Description for .D1, .D2 Opcodes src1 is subtracted from src2. The result is placed in dst. Execution for .D1, .
PAGE 312
SUB Subtract Two Signed Integers Without Saturation Instruction Type Single-cycle Delay Slots 0 See Also ADD, SSUB, SUBC, SUBDP, SUBSP, SUBU, SUB2 Example SUB .
PAGE 313
SUBAB Subtract Using Byte Addressing Mode Subtract Using Byte Addressing Mode SUBAB SUBAB (.unit) src2, src1, dst Syntax .unit = .D1 or .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 0001 sint ucst5 sint .D1, .
PAGE 314
SUBAB Subtract Using Byte Addressing Mode Example SUBAB .
PAGE 315
SUBAH Subtract Using Halfword Addressing Mode Subtract Using Halfword Addressing Mode SUBAH SUBAH (.unit) src2, src1, dst Syntax .unit = .D1 or .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 0101 sint ucst5 sint .D1, .
PAGE 316
SUBAW Subtract Using Word Addressing Mode Subtract Using Word Addressing Mode SUBAW SUBAW (.unit) src2, src1, dst Syntax .unit = .D1 or .D2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 Opcode map field used... For operand type... src2 src1 dst src2 src1 dst 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 Unit Opfield sint sint sint .D1, .D2 11 1001 sint ucst5 sint .D1, .
PAGE 317
Subtract Using Word Addressing Mode Example SUBAW .
PAGE 318
SUBC Subtract Conditionally and Shift−Used for Division Subtract Conditionally and Shift—Used for Division SUBC SUBC (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 1 0 0 1 0 1 1 1 1 0 s p creg z dst src2 src1 x 3 1 5 5 5 1 Opcode map field used... src1 src2 dst For operand type... uint xuint uint 4 3 2 1 1 0 1 Unit .L1, .L2 Description Subtract src2 from src1.
PAGE 319
Subtract Conditionally and Shift−Used for Division Example 1 SUBC .L1 A0,A1,A0 Before instruction Example 2 4698 A0 0000 024B4h A1 0000 1F12h 7954 A1 0000 1F12h 9396 A0,A1,A0 Before instruction SPRU733 1 cycle after instruction A0 0000 125Ah SUBC .
PAGE 320
SUBDP Subtract Two Double-Precision Floating-Point Values Subtract Two Double-Precision Floating-Point Values SUBDP SUBDP (.unit) src1, src2, dst .unit = .L1 or .L2 or SUBDP (.unit) src1, src2, dst .unit = .S1 or .S2 Syntax (C67x and C67x+ CPU) (C67x+ CPU only) C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type...
PAGE 321
Subtract Two Double-Precision Floating-Point Values SUBDP Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) The source specific warning bits set in FADCR are set according to the registers sources in the actual machine instruction and not according to the order of the sources in the assembly form. 3) If rounding is performed, the INEX bit is set. 4) If one source is SNaN or QNaN, the result is NaN_out.
PAGE 322
SUBDP Subtract Two Double-Precision Floating-Point Values Pipeline Pipeline Stage Read E1 E2 src1_l src2_l src1_h src2_h E3 E4 Written Unit in use .L or .S E5 E6 E7 dst_l dst_h .L or .S For the C67x CPU, if dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
PAGE 323
SUBSP Subtract Two Single-Precision Floating-Point Values Subtract Two Single-Precision Floating-Point Values SUBSP SUBSP (.unit) src1, src2, dst .unit = .L1 or .L2 or SUBSP (.unit) src1, src2, dst .unit = .S1 or .S2 Syntax (C67x and C67x+ CPU) (C67x+ CPU only) C67x and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type...
PAGE 324
SUBSP Subtract Two Single-Precision Floating-Point Values Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) The source specific warning bits set in FADCR are set according to the registers sources in the actual machine instruction and not according to the order of the sources in the assembly form. 3) If rounding is performed, the INEX bit is set. 4) If one source is SNaN or QNaN, the result is NaN_out.
PAGE 325
Subtract Two Single-Precision Floating-Point Values Pipeline Pipeline Stage E1 E2 E3 E4 src1 src2 Read dst Written Unit in use .L Instruction Type 4-cycle Delay Slots 3 Functional Unit Latency 1 See Also ADDSP, SUB, SUBDP, SUBU Example SUBSP .L1X A2,B1,A3 Before instruction SPRU733 SUBSP 4 cycles after instruction A2 4109 999Ah A2 4109 999Ah 8.6 B1 C020 0000h B1 C020 0000h −2.5 A3 XXXX XXXXh A3 4131 999Ah 11.
PAGE 326
SUBU Subtract Two Unsigned Integers Without Saturation Subtract Two Unsigned Integers Without Saturation SUBU SUBU (.unit) src1, src2, dst Syntax .unit = .L1 or .L2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 Opcode map field used... For operand type... src1 src2 dst src1 src2 dst Opfield uint xuint ulong .L1, .L2 010 1111 xuint uint ulong .L1, .
PAGE 327
Subtract Two Unsigned Integers Without Saturation Example SUBU .
PAGE 328
SUB2 Subtract Two 16-Bit Integers on Upper and Lower Register Halves Subtract Two 16-Bit Integers on Upper and Lower Register Halves SUB2 SUB2 (.unit) src1, src2, dst Syntax .unit = .S1 or .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode 31 12 11 creg 29 28 z dst src2 src1 x 0 3 1 5 5 5 1 Description 27 23 22 18 17 13 6 1 5 4 3 2 1 0 0 0 0 1 1 0 0 0 s p 1 Opcode map field used... For operand type... src1 src2 dst sint xsint sint 1 Unit .S1, .
PAGE 329
Subtract Two 16-Bit Integers on Upper and Lower Register Halves Execution if (cond) SUB2 { (lsb16(src1) − lsb16(src2)) → lsb16(dst); (msb16(src1) − msb16(src2)) → msb16(dst); } else nop Pipeline Pipeline Stage E1 src1, src2 Read Written dst Unit in use .S Instruction Type Single-cycle Delay Slots 0 See Also ADD2, SSUB, SUB, SUBC, SUBU Example 1 SUB2 .
PAGE 330
XOR Bitwise Exclusive OR Bitwise Exclusive OR XOR XOR (.unit) src1, src2, dst Syntax .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 For operand type... src1 src2 dst src1 src2 dst 29 2 1 0 1 1 0 s p 1 Unit Opfield uint xuint uint .L1, .L2 110 1111 scst5 xuint uint .L1, .
PAGE 331
Bitwise Exclusive OR Execution Pipeline if (cond) src1 XOR src2 → dst else nop Pipeline Stage Read E1 src1, src2 dst Written .L or .S Unit in use Instruction Type Single-cycle Delay Slots 0 See Also AND, OR Example 1 XOR .S1 A3, A4, A5 Before instruction Example 2 1 cycle after instruction A3 0721 325Ah A3 0721 325Ah A4 0019 0F12h A4 0019 0F12h A5 xxxx xxxxh A5 0738 3D48h XOR .
PAGE 332
ZERO Zero a Register ZERO Zero a Register Syntax ZERO (.unit) dst .unit = .L1, .L2, .D1, .D2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Opcode map field used... For operand type... Description Unit Opfield dst sint .L1, .L2 001 0111 dst slong .L1, .L2 011 0111 dst sint .D1, .D2 01 0001 dst sint .S1, .S2 01 0111 The ZERO pseudo-operation fills the dst register with 0s by subtracting the dst from itself and placing the result in the dst.
PAGE 333
Chapter 4 Pipeline The C67x DSP pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility: Control of the pipeline is simplified by eliminating pipeline interlocks. Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides singlecycle throughput. This chapter starts with a description of the pipeline flow.
PAGE 334
Pipeline Operation Overview 4.1 Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the C67x DSP instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instructions. The execute stage of the pipeline requires a varying number of phases, depending on the type of instruction.
PAGE 335
Pipeline Operation Overview Figure 4−2. Fetch Phases of the Pipeline CPU (a) (b) PG PS PW PR Functional units Registers PR PG Memory PS PW (c) 256 Fetch LDW LDW SHR SHR SMPYH SMPYH MV NOP PG LDW LDW SMPYH SMPY SADD SADD B MVK PS LDW LDW MVKLH MV SMPYH SMPY B MVK PW LDW LDW MVK ADD SHL LDW LDW MVK PR Decode 4.1.
PAGE 336
Pipeline Operation Overview Figure 4−3(a) shows the decode phases in sequential order from left to right. Figure 4−3(b) shows a fetch packet that contains two execute packets as they are processed through the decode stage of the pipeline. The last six instructions of the fetch packet (FP) are parallel and form an execute packet (EP). This EP is in the dispatch phase (DP) of the decode stage. The arrows indicate each instruction’s assigned functional unit for execution during the same cycle.
PAGE 337
Pipeline Operation Overview 4.1.3 Execute The execute portion of the pipeline is subdivided into ten phases (E1−E10), as compared to the five phases in a fixed-point pipeline. Different types of instructions require different numbers of these phases to complete their execution. These phases of the pipeline play an important role in your understanding the device state at CPU cycle boundaries. The execution of different types of instructions in the pipeline is described in section 4.
PAGE 338
Pipeline Operation Overview 4.1.4 Pipeline Operation Summary Figure 4−5 shows all the phases in each stage of the C67x DSP pipeline in sequential order, from left to right. Figure 4−5. Pipeline Phases Fetch PG PS Execute Decode PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 Figure 4−6 shows an example of the pipeline flow of consecutive fetch packets that contain eight parallel instructions.
PAGE 339
Pipeline Operation Overview ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ Table 4−1. Operations Occurring During Pipeline Phases Instruction Type Completed Stage Phase Program fetch Program address generation PG The address of the fetch packet is determined. Program address sent PS The address of the fetch packet is sent to the memory. Program wait PW A program memory access is performed. Program data receive PR The fetch packet is at the CPU boundary.
PAGE 340
Pipeline Operation Overview ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ Table 4−1. Operations Occurring During Pipeline Phases (Continued) Stage Phase Execute 2 Symbol During This Phase E2 For load instructions, the address is sent to memory. For store instructions, the address and data are sent to memory.† Instruction Type Completed Multiply 2-cycle DP DP compare Single-cycle instructions that saturate results set the SAT bit in the SCR if saturation occurs.
PAGE 341
Pipeline Operation Overview ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ Table 4−1. Operations Occurring During Pipeline Phases (Continued) Stage Phase Execute 5 Symbol During This Phase E5 For load instructions, data is written into a register file.† Load INTDP For INTDP and MPYSP2DP instructions, the upper 32 bits of the result are written to a register file.
PAGE 342
Pipeline Operation Overview Registers used by the instructions in E1 are shaded in Figure 4−7. The multiplexers used for the input operands to the functional units are also shaded in the figure. The bold crosspaths are used by the MPY and SUBSP instructions. Figure 4−7.
PAGE 343
Pipeline Operation Overview Many C67x DSP instructions are single-cycle instructions, which means they have only one execution phase (E1). The other instructions require more than one execute phase. The types of instructions, each of which require different numbers of execute phases, are described in section 4.2. Example 4−1. Execute Packet in Figure 4−7 LDDW ADDSP SUBSP MPYSP MPYSP ABSSP .D1 .L1 .L2X .M1X .M2 .
PAGE 344
Pipeline Execution of Instruction Types 4.2 Pipeline Execution of Instruction Types The pipeline operation of the C67x DSP instructions can be categorized into fourteen instruction types. Thirteen of these are shown in Table 4−2 (NOP is not included in the table), which is a mapping of operations occurring in each execution phase for the different instruction types. The delay slots and functional unit latency associated with each instruction type are listed in the bottom row. See section 3.7.
PAGE 345
Pipeline Execution of Instruction Types Table 4−2.
PAGE 346
Pipeline Execution of Instruction Types Table 4−2.
PAGE 347
Pipeline Execution of Instruction Types Table 4−2.
PAGE 348
Pipeline Execution of Instruction Types 4.2.1 Single-Cycle Instructions Single-cycle instructions complete execution during the E1 phase of the pipeline (see Table 4−3). Figure 4−8 shows the fetch, decode, and execute phases of the pipeline that single-cycle instructions use. Figure 4−9 shows the single-cycle execution diagram. The operands are read, the operation is performed, and the results are written to a register, all during E1. Single-cycle instructions have no delay slots. Table 4−3.
PAGE 349
Pipeline Execution of Instruction Types 4.2.2 16 y 16-Bit Multiply Instructions The 16 × 16-bit multiply instructions use both the E1 and E2 phases of the pipeline to complete their operations (see Table 4−4). Figure 4−10 shows the fetch, decode, and execute phases of the pipeline that the multiply instructions use. Figure 4−11 shows the operations occurring in the pipeline for a multiply. In the E1 phase, the operands are read and the multiply begins.
PAGE 350
Pipeline Execution of Instruction Types 4.2.3 Store Instructions Store instructions require phases E1 through E3 of the pipeline to complete their operations (see Table 4−5). Figure 4−12 shows the fetch, decode, and execute phases of the pipeline that the store instructions use. Figure 4−13 shows the operations occurring in the pipeline phases for a store instruction. In the E1 phase, the address of the data to be stored is computed.
PAGE 351
Pipeline Execution of Instruction Types Figure 4−13. Store Instruction Execution Block Diagram Functional unit .D E1 E2 Register file Data E2 Memory controller Address E3 Memory When you perform a load and a store to the same memory location, these rules apply (i = cycle): When a load is executed before a store, the old value is loaded and the new value is stored. i LDW i+1 STW When a store is executed before a load, the new value is stored and the new value is loaded.
PAGE 352
Pipeline Execution of Instruction Types 4.2.4 Load Instructions Data loads require five, E1−E5, of the pipeline execute phases to complete their operations (see Table 4−6). Figure 4−14 shows the fetch, decode, and execute phases of the pipeline that the load instructions use. Figure 4−15 shows the operations occurring in the pipeline phases for a load. In the E1 phase, the data address pointer is modified in its register. In the E2 phase, the data address is sent to data memory.
PAGE 353
Pipeline Execution of Instruction Types Figure 4−15. Load Instruction Execution Block Diagram Functional unit .D E2 E1 Register file E5 Data Memory controller E4 Address E3 Memory In the E4 stage of a load, the data is received at the CPU core boundary. Finally, in the E5 phase, the data is loaded into a register. Because data is not written to the register until E5, load instructions have four delay slots.
PAGE 354
Pipeline Execution of Instruction Types 4.2.5 Branch Instructions Although branch takes one execute phase, there are five delay slots between the execution of the branch and execution of the target code (see Table 4−7). Figure 4−16 shows the pipeline phases used by the branch instruction and branch target code. The delay slots are shaded. Figure 4−17 shows a branch instruction execution block diagram. If a branch is in the E1 phase of the pipeline (in the .
PAGE 355
Pipeline Execution of Instruction Types Figure 4−17. Branch Instruction Execution Block Diagram 256 Fetch Decode STH STH SADD SADD SMPYH SMPY SUB B PG SADD SADD SHR SHR SMPYH SMPYH LDW LDW PS STH STH SADD SADD SMPYH SMPY SUB B PW LDW LDW SHR SHR SMPYH SMPYH MV NOP PR 32 32 32 SMPYH 32 SMPY 32 SADD 32 SADD 32 B 32 MVK DP LDW DC LDW Execute .L1 SPRU733 MVK .S1 SMPY .M1 .D1 .D2 SMPYH .M2 B .S2 Pipeline E1 .
PAGE 356
Pipeline Execution of Instruction Types 4.2.6 Two-Cycle DP Instructions Two-cycle DP instructions use both the E1 and E2 phases of the pipeline to complete their operations (see Table 4−8). The following instructions are two-cycle DP instructions: ABSDP RCPDP RSQDP SPDP The lower and upper 32 bits of the DP source are read on E1 using the src1 and src2 ports, respectively. The lower 32 bits of the DP source are written on E1 and the upper 32 bits of the DP source are written on E2.
PAGE 357
Pipeline Execution of Instruction Types 4.2.7 Four-Cycle Instructions Four-cycle instructions use the E1 through E4 phases of the pipeline to complete their operations (see Table 4−9). The following instructions are four-cycle instructions: ADDSP DPINT DPSP DPTRUNC INTSP MPYSP SPINT SPTRUNC SUBSP The sources are read on E1 and the results are written on E4. The four-cycle instructions are executed on the .M or .L units. The status is written to the FMCR or FADCR on E4.
PAGE 358
Pipeline Execution of Instruction Types 4.2.8 INTDP Instruction The INTDP instruction uses the E1 through E5 phases of the pipeline to complete its operations (see Table 4−10). src2 is read on E1, the lower 32 bits of the result are written on E4, and the upper 32 bits of the result are written on E5. The INTDP instruction is executed on the .L unit. The status is written to the FADCR on E4. Figure 4−20 shows the fetch, decode, and execute phases of the pipeline that the INTDP instruction uses.
PAGE 359
Pipeline Execution of Instruction Types 4.2.9 DP Compare Instructions The DP compare instructions use the E1 and E2 phases of the pipeline to complete their operations (see Table 4−11). The lower 32 bits of the sources are read on E1, the upper 32 bits of the sources are read on E2, and the results are written on E2. The following instructions are DP compare instructions: CMPEQDP CMPLTDP CMPGTDP The DP compare instructions are executed on the .S unit.
PAGE 360
Pipeline Execution of Instruction Types 4.2.10 ADDDP/SUBDP Instructions The ADDDP/SUBDP instructions use the E1 through E7 phases of the pipeline to complete their operations (see Table 4−12). The lower 32 bits of the result are written on E6, and the upper 32 bits of the result are written on E7. The ADDDP/SUBDP instructions are executed on the .L unit. The functional unit latency for ADDDP/SUBDP instructions is 2. The status is written to the FADCR on E6.
PAGE 361
Pipeline Execution of Instruction Types 4.2.11 MPYI Instruction The MPYI instruction uses the E1 through E9 phases of the pipeline to complete its operations (see Table 4−13). The sources are read on cycles E1 through E4 and the result is written on E9. The MPYI instruction is executed on the .M unit. The functional unit latency for the MPYI instruction is 4. Figure 4−23 shows the fetch, decode, and execute phases of the pipeline that the MPYI instruction uses. Table 4−13.
PAGE 362
Pipeline Execution of Instruction Types 4.2.12 MPYID Instruction The MPYID instruction uses the E1 through E10 phases of the pipeline to complete its operations (see Table 4−14). The sources are read on cycles E1 through E4, the lower 32 bits of the result are written on E9, and the upper 32 bits of the result are written on E10. The MPYID instruction is executed on the .M unit. The functional unit latency for the MPYID instruction is 4.
PAGE 363
Pipeline Execution of Instruction Types 4.2.13 MPYDP Instruction The MPYDP instruction uses the E1 through E10 phases of the pipeline to complete its operations (see Table 4−15). The lower 32 bits of src1 are read on E1 and E2, and the upper 32 bits of src1 are read on E3 and E4. The lower 32 bits of src2 are read on E1 and E3, and the upper 32 bits of src2 are read on E2 and E4. The lower 32 bits of the result are written on E9, and the upper 32 bits of the result are written on E10.
PAGE 364
Pipeline Execution of Instruction Types 4.2.14 MPYSPDP Instruction The MPYSPDP instruction uses the E1 through E7 phases of the pipeline to complete its operations (see Table 4−16). src1 is read on E1 and E2. The lower 32 bits of src2 are read on E1, and the upper 32 bits of src2 are read on E2. The lower 32 bits of the result are written on E6, and the upper 32 bits of the result are written on E7. The MPYSPDP instruction is executed on the .M unit.
PAGE 365
Pipeline Execution of Instruction Types / Functional Unit Constraints 4.2.15 MPYSP2DP Instruction The MPYSP2DP instruction uses the E1 through E5 phases of the pipeline to complete its operations (see Table 4−17). src1 and src2 are read on E1. The lower 32 bits of the result are written on E4, and the upper 32 bits of the result are written on E5. The MPYSP2DP instruction is executed on the .M unit. The functional unit latency for the MPYSP2DP instruction is 2.
PAGE 366
Functional Unit Constraints 4.3.1 .S-Unit Constraints Table 4−18 shows the instruction constraints for single-cycle instructions executing on the .S unit. Table 4−18. Single-Cycle .
PAGE 367
Functional Unit Constraints Table 4−19 shows the instruction constraints for DP compare instructions executing on the .S unit. Table 4−19. DP Compare .
PAGE 368
Functional Unit Constraints Table 4−20 shows the instruction constraints for 2-cycle DP instructions executing on the .S unit. Table 4−20. 2-Cycle DP .
PAGE 369
Functional Unit Constraints Table 4−21 shows the instruction constraints for ADDSP/SUBSP instructions executing on the .S unit. Table 4−21. ADDSP/SUBSP .
PAGE 370
Functional Unit Constraints Table 4−22 shows the instruction constraints for ADDDP/SUBDP instructions executing on the .S unit. Table 4−22. ADDDP/SUBDP .
PAGE 371
Functional Unit Constraints Table 4−23 shows the instruction constraints for branch instructions executing on the .S unit. Table 4−23. Branch .
PAGE 372
Functional Unit Constraints 4.3.2 .M-Unit Constraints Table 4−24 shows the instruction constraints for 16 × 16 multiply instructions executing on the .M unit. Table 4−24. 16 16 Multiply .
PAGE 373
Functional Unit Constraints Table 4−25 shows the instruction constraints for 4-cycle instructions executing on the .M unit. Table 4−25. 4-Cycle .
PAGE 374
Functional Unit Constraints Table 4−26 shows the instruction constraints for MPYI instructions executing on the .M unit. Table 4−26. MPYI .
PAGE 375
Functional Unit Constraints Table 4−27 shows the instruction constraints for MPYID instructions executing on the .M unit. Table 4−27. MPYID .
PAGE 376
Functional Unit Constraints Table 4−28 shows the instruction constraints for MPYDP instructions executing on the .M unit. Table 4−28. MPYDP .
PAGE 377
Functional Unit Constraints Table 4−29 shows the instruction constraints for MPYSP instructions executing on the .M unit. Table 4−29. MPYSP .
PAGE 378
Functional Unit Constraints Table 4−30 shows the instruction constraints for MPYSPDP instructions executing on the .M unit. Table 4−30. MPYSPDP .
PAGE 379
Functional Unit Constraints Table 4−31 shows the instruction constraints for MPYSP2DP instructions executing on the .M unit. Table 4−31. MPYSP2DP .
PAGE 380
Functional Unit Constraints 4.3.3 .L-Unit Constraints Table 4−32 shows the instruction constraints for single-cycle instructions executing on the .L unit. Table 4−32. Single-Cycle .
PAGE 381
Functional Unit Constraints Table 4−33 shows the instruction constraints for 4-cycle instructions executing on the .L unit. Table 4−33. 4-Cycle .
PAGE 382
Functional Unit Constraints Table 4−34 shows the instruction constraints for INTDP instructions executing on the .L unit. Table 4−34. INTDP .
PAGE 383
Functional Unit Constraints Table 4−35 shows the instruction constraints for ADDDP/SUBDP instructions executing on the .L unit. Table 4−35. ADDDP/SUBDP .
PAGE 384
Functional Unit Constraints 4.3.4 .D-Unit Instruction Constraints Table 4−36 shows the instruction constraints for load instructions executing on the .D unit. Table 4−36. Load .
PAGE 385
Functional Unit Constraints Table 4−37 shows the instruction constraints for store instructions executing on the .D unit. Table 4−37. Store .
PAGE 386
Functional Unit Constraints Table 4−38 shows the instruction constraints for single-cycle instructions executing on the .D unit. Table 4−38. Single-Cycle .
PAGE 387
Functional Unit Constraints Table 4−39 shows the instruction constraints for LDDW instructions executing on the .D unit. Table 4−39.
PAGE 388
Performance Considerations 4.4 Performance Considerations The C67x DSP pipeline is most effective when it is kept as full as the algorithms in the program allow it to be. It is useful to consider some situations that can affect pipeline performance. A fetch packet (FP) is a grouping of eight instructions. Each FP can be split into from one to eight execute packets (EPs). Each EP contains instructions that execute in parallel. Each instruction executes in an independent functional unit.
PAGE 389
Performance Considerations Figure 4−28.
PAGE 390
Performance Considerations 4.4.2 Multicycle NOPs The NOP instruction has an optional operand, count, that allows you to issue a single instruction for multicycle NOPs. A NOP 2, for example, fills in extra delay slots for the instructions in its execute packet and for all previous execute packets. If a NOP 2 is in parallel with an MPY instruction, the MPY results is available for use by instructions in the next execute packet.
PAGE 391
Performance Considerations Figure 4−30 shows how a multicycle NOP can be affected by a branch. If the delay slots of a branch finish while a multicycle NOP is still dispatching NOPs into the pipeline, the branch overrides the multicycle NOP and the branch target begins execution five delay slots after the branch was issued. Figure 4−30. Branching and Multicycle NOPs Pipeline Phase Cycle # 1 EP1 2 EP2 3 Branch Target ...
PAGE 392
Performance Considerations 4.4.3 Memory Considerations The C67x DSP has a memory configuration with program memory in one physical space and data memory in another physical space. Data loads and program fetches have the same operation in the pipeline, they just use different phases to complete their operations. With both data loads and program fetches, memory accesses are broken into multiple phases. This enables the C67x DSP to access memory at a high speed. These phases are shown in Figure 4−31.
PAGE 393
Performance Considerations Depending on the type of memory and the time required to complete an access, the pipeline may stall to ensure proper coordination of data and instructions. This is discussed in section 4.4.3.1. In the instance where multiple accesses are made to a single ported memory, the pipeline will stall to allow the extra access to occur. This is called a memory bank hit and is discussed in section 4.4.3.2. 4.4.3.
PAGE 394
Performance Considerations 4.4.3.2 Memory Bank Hits Most C67x devices use an interleaved memory bank scheme, as shown in Figure 4−33. Each number in the diagram represents a byte address. A load byte (LDB) instruction from address 0 loads byte 0 in bank 0. A load halfword (LDH) instruction from address 0 loads the halfword value in bytes 0 and 1, which are also in bank 0. A load word (LDW) instruction from address 0 loads bytes 0 through 3 in banks 0 and 1.
PAGE 395
Performance Considerations Table 4−41. Loads in Pipeline from Example 4−2 i i+1 i+2 i+3 i+4 i+5 LDW .D1 Bank 0 E1 E2 E3 − E4 E5 LDW .D2 Bank 0 E1 E2 − E3 E4 E5 For devices that have more than one memory space (see Figure 4−34), an access to bank 0 in one space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs. The internal memory of the C67x DSP family varies from device to device.
PAGE 396
Chapter 59 Interrupts This chapter describes CPU interrupts, including reset and the nonmaskable interrupt (NMI). It details the related CPU control registers and their functions in controlling interrupts. It also describes interrupt processing, the method the CPU uses to detect automatically the presence of interrupts and divert program execution flow to your interrupt service code. Finally, the chapter describes the programming implications of interrupts. Topic SPRU733 Page 5.1 Overview . . . . . .
PAGE 397
Overview 5.1 Overview Typically, DSPs work in an environment that contains multiple external asynchronous events. These events require tasks to be performed by the DSP when they occur. An interrupt is an event that stops the current process in the CPU so that the CPU can attend to the task needing completion because of the event. These interrupt sources can be on chip or off chip, such as timers, analog-to-digital converters, or other peripherals.
PAGE 398
Overview Table 5−1. Interrupt Priorities Priority Interrupt Name Interrupt Type Highest Reset Reset NMI Nonmaskable INT4 Maskable INT5 Maskable INT6 Maskable INT7 Maskable INT8 Maskable INT9 Maskable INT10 Maskable INT11 Maskable INT12 Maskable INT13 Maskable INT14 Maskable INT15 Maskable Lowest 5.1.1.1 Reset (RESET) Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state.
PAGE 399
Overview 5.1.1.2 Nonmaskable Interrupt (NMI) NMI is the second-highest priority interrupt and is generally used to alert the CPU of a serious hardware problem such as imminent power failure. For NMI processing to occur, the nonmaskable interrupt enable (NMIE) bit in the interrupt enable register must be set to 1. If NMIE is set to 1, the only condition that can prevent NMI processing is if the NMI occurs during the delay slots of a branch (whether the branch is taken or not).
PAGE 400
Overview 5.1.1.4 Interrupt Acknowledgment (IACK) and Interrupt Number (INUMn) The IACK and INUMn signals alert hardware external to the C6000 that an interrupt has occurred and is being processed. The IACK signal indicates that the CPU has begun processing an interrupt. The INUMn signal (INUM3− INUM0) indicates the number of the interrupt (bit position in the IFR) that is being processed.
PAGE 401
Overview 5.1.2 Interrupt Service Table (IST) When the CPU begins processing an interrupt, it references the interrupt service table (IST). The IST is a table of fetch packets that contain code for servicing the interrupts. The IST consists of 16 consecutive fetch packets. Each interrupt service fetch packet (ISFP) contains eight instructions. A simple interrupt service routine may fit in an individual fetch packet. The addresses and contents of the IST are shown in Figure 5−1.
PAGE 402
Overview 5.1.2.1 Interrupt Service Fetch Packet (ISFP) An ISFP is a fetch packet used to service an interrupt. Figure 5−2 shows an ISFP that contains an interrupt service routine small enough to fit in a single fetch packet (FP). To branch back to the main program, the FP contains a branch to the interrupt return pointer instruction (B IRP). This is followed by a NOP 5 instruction to allow the branch target to reach the execution stage of the pipeline.
PAGE 403
Overview If the interrupt service routine for an interrupt is too large to fit in a single fetch packet, a branch to the location of additional interrupt service routine code is required. Figure 5−3 shows that the interrupt service routine for INT4 was too large for a single fetch packet, and a branch to memory location 1234h is required to complete the interrupt service routine. Note: The instruction B LOOP branches into the middle of a fetch packet and processes code starting at address 1234h.
PAGE 404
Overview 5.1.2.2 Interrupt Service Table Pointer (ISTP) The reset fetch packet must be located at address 0, but the rest of the IST can be at any program memory location that is on a 256-word boundary. The location of the IST is determined by the interrupt service table base (ISTB) field of the interrupt service table pointer register (ISTP). The ISTP is shown in Figure 2−11 (page 2-21) and described in Table 2−12. Example 5−1 shows the relationship of the ISTB to the table location. Example 5−1.
PAGE 405
Overview 5.1.3 Summary of Interrupt Control Registers Table 5−2 lists the interrupt control registers on the C67x CPU. Table 5−2.
PAGE 406
Globally Enabling and Disabling Interrupts 5.2 Globally Enabling and Disabling Interrupts The control status register (CSR) contains two fields that control interrupts: GIE and PGIE, as shown in Figure 2−4 (page 2-13) and described in Table 2−7 (page 2-14). The global interrupt enable (GIE) allows you to enable or disable all maskable interrupts: GIE = 1 enables the maskable interrupts so that they are processed. GIE = 0 disables the maskable interrupts so that they are not processed.
PAGE 407
Globally Enabling and Disabling Interrupts Example 5−2. Code Sequence to Disable Maskable Interrupts Globally MVC AND MVC CSR,B0 -2,B0,B0 B0,CSR ; get CSR ; get ready to clear GIE ; clear GIE Example 5−3.
PAGE 408
Individual Interrupt Control 5.3 Individual Interrupt Control Servicing interrupts effectively requires individual control of all three types of interrupts: reset, nonmaskable, and maskable. Enabling and disabling individual interrupts is done with the interrupt enable register (IER). The status of pending interrupts is stored in the interrupt flag register (IFR). Manual interrupt processing can be accomplished through the use of the interrupt set register (ISR) and interrupt clear register (ICR).
PAGE 409
Individual Interrupt Control 5.3.2 Status of Interrupts The interrupt flag register (IFR) contains the status of INT4−INT15 and NMI. Each interrupt’s corresponding bit in IFR is set to 1 when that interrupt occurs; otherwise, the bits have a value of 0. If you want to check the status of interrupts, use the MVC instruction to read IFR. The IFR is shown in Figure 2−8 (page 2-18) and described in Table 2−10. 5.3.
PAGE 410
Individual Interrupt Control 5.3.4 Returning From Interrupt Servicing After RESET goes high, the control registers are brought to a known value and program execution begins at address 0h. After nonmaskable and maskable interrupt servicing, use a branch to the corresponding return pointer register to continue the previous program execution. 5.3.4.1 CPU State After RESET After RESET, the control registers and bits contain the following values: 5.3.4.
PAGE 411
Interrupt Detection and Processing 5.4 Interrupt Detection and Processing When an interrupt occurs, it sets a flag in the interrupt flag register (IFR). Depending on certain conditions, the interrupt may or may not be processed. This section discusses the mechanics of setting the flag bit, the conditions for processing an interrupt, and the order of operation for detecting and processing an interrupt. The similarities and differences between reset and nonreset interrupts are also discussed. 5.4.
PAGE 412
Interrupt Detection and Processing Any pending interrupt will be taken as soon as pending branches are completed. Figure 5−4.
PAGE 413
Interrupt Detection and Processing 5.4.3 Actions Taken During Nonreset Interrupt Processing During CPU cycles 6 through 14 of Figure 5−4, the following interrupt processing actions occur: Processing of subsequent nonreset interrupts is disabled. For all interrupts except NMI, the PGIE bit is set to the value of the GIE bit and then the GIE bit is cleared. For NMI, the NMIE bit is cleared. The next execute packets (from n + 5 on) are annulled.
PAGE 414
Interrupt Detection and Processing 5.4.4 Setting the RESET Interrupt Flag RESET must be held low for a minimum of 10 clock cycles. Four clock cycles after RESET goes high, processing of the reset vector begins. The flag for RESET (IF0) in the IFR is set by the low-to-high transition of the RESET signal on the CPU boundary. In Figure 5−5, IF0 is set during CPU cycle 15. This transition is detected on a clock-cycle by clock-cycle basis and is not affected by memory stalls that might extend a CPU cycle.
PAGE 415
Interrupt Detection and Processing 5.4.5 Actions Taken During RESET Interrupt Processing A low signal on the RESET pin is the only requirement to process a reset. Once RESET makes a high-to-low transition, the pipeline is flushed and CPU registers are returned to their reset values. GIE, NMIE, and the ISTB in the ISTP are cleared. For the CPU state after reset, see section 5.3.4.1.
PAGE 416
Performance Considerations 5.5 Performance Considerations The interaction of the C6000 CPU and sources of interrupts present performance issues for you to consider when you are developing your code. 5.5.1 General Performance Overhead. Overhead for all CPU interrupts is 9 cycles. You can see this in Figure 5−4, where no new instructions are entering the E1 pipeline phase during CPU cycles 6 through 14. Latency. Interrupt latency is 13 cycles (21 cycles for RESET).
PAGE 417
Programming Considerations 5.6 Programming Considerations The interaction of the C6000 CPUs and sources of interrupts present programming issues for you to consider when you are developing your code. 5.6.1 Single Assignment Programming Using the same register to store different variables (called here: multiple assignment) can result in unpredictable operation when the code can be interrupted. To avoid unpredictable operation, you must employ the single assignment method in code that can be interrupted.
PAGE 418
Programming Considerations Example 5−11. Code Using Single Assignment LDW ADD NOP MPY 5.6.2 .D1 .L1 3 .M1 *A0,A6 A1,A2,A3 A6,A4,A5 ; uses A6 Nested Interrupts Generally, when the CPU enters an interrupt service routine, interrupts are disabled. However, when the interrupt service routine is for one of the maskable interrupts (INT4−INT15), an NMI can interrupt processing of the maskable interrupt.
PAGE 419
Programming Considerations Example 5−13 shows a C-based interrupt handler that allows nested interrupts. The steps are similar, although the compiler takes care of allocating the stack and saving CPU registers. For more information on using C to access control registers and write interrupt handlers, see the TMS320C6000 Optimizing C Compiler Users Guide, SPRU187. Example 5−12.
PAGE 420
Programming Considerations Example 5−13. C Interrupt Service Routine That Allows Nested Interrupts /* c6x.h contains declarations of the C6x control registers #include */ interrupt void isr(void) { unsigned old_csr; unsigned old_irp; old_irp = IRP old_csr = CSR ;/* Save IRP ;/* Save CSR (and thus PGIE) */ */ CSR = old_csr | 1 ;/* Enable interrupts */ /* Interrupt service code goes here.
PAGE 421
Programming Considerations 5.6.4 Traps A trap behaves like an interrupt, but is created and controlled with software. The trap condition can be stored in any one of the conditional registers: A1, A2, B0, B1, or B2. If the trap condition is valid, a branch to the trap handler routine processes the trap and the return. Example 5−15 and Example 5−16 show a trap call and the return code sequence, respectively.
PAGE 422
Appendix AppendixAA Instruction Compatibility The C62x, C64x, and C67x DSPs share an instruction set. All of the instructions valid for the C62x DSP are also valid for the C67x and C67x+ DSPs. The C67x DSP adds specific instructions for 32-bit integer multiply, doubleword load, and floating-point operations. Table A−1 lists the instructions that are common to the C62x, C64x, C67x, and C67x+ DSPs. Table A−1.
PAGE 423
Instruction Compatibility Table A−1.
PAGE 424
Instruction Compatibility Table A−1.
PAGE 425
Instruction Compatibility Table A−1.
PAGE 426
Instruction Compatibility Table A−1.
PAGE 427
Instruction Compatibility Table A−1.
PAGE 428
Appendix AppendixBA Mapping Between Instruction and Functional Unit Table B−1 lists the instructions that execute on each functional unit. Table B−1. Functional Unit to Instruction Mapping Functional Unit Instruction ABS .L Unit .M Unit .S Unit ABSDP ABSSP ADD .
PAGE 429
Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit Instruction .L Unit .M Unit .S Unit B displacement B register † B IRP † B NRP † CLR .
PAGE 430
Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit Instruction .L Unit INTDP INTDPU INTSP INTSPU .M Unit .S Unit .
PAGE 431
Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit Instruction .L Unit .M Unit MPYHU MPYHULS MPYHUS MPYI MPYID MPYLH MPYLHU MPYLSHU MPYLUHS MPYSP MPYSPDP§ MPYSP2DP§ MPYSU MPYU MPYUS MV .S Unit .
PAGE 432
Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit Instruction .L Unit .M Unit .S Unit NORM NOT OR RCPDP RCPSP RSQRDP RSQRSP SADD SAT SET SHL SHR SHRU SMPY SMPYH SMPYHL SMPYLH SPDP SPINT SPTRUNC SSHL SSUB .
PAGE 433
Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit Instruction .L Unit .M Unit .S Unit .
PAGE 434
Appendix AppendixCA .D Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .D functional unit and illustrates the opcode maps for these instructions. Topic SPRU733 Page C.1 Instructions Executing in the .D Functional Unit . . . . . . . . . . . . . . . . . C-2 C.2 Opcode Map Symbols and Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3 C.3 32-Bit Opcode Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 435
Instructions Executing in the .D Functional Unit C.1 Instructions Executing in the .D Functional Unit Table C−1 lists the instructions that execute in the .D functional unit. Table C−1. Instructions Executing in the .
PAGE 436
Opcode Map Symbols and Meanings C.2 Opcode Map Symbols and Meanings Table C−2 lists the symbols and meanings used in the opcode maps. Table C−2. .D Unit Opcode Map Symbol Definitions Symbol Meaning baseR base address register creg 3-bit field specifying a conditional register dst destination. For compact instructions, dst is coded as an offset from either A16 or B16 depending on the value of the t bit.
PAGE 437
Opcode Map Symbols and Meanings Table C−3.
PAGE 438
32-Bit Opcode Maps C.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .D unit are mapped in Figure C−1 through Figure C−4. Figure C−1. 1 or 2 Sources Instruction Format 31 29 28 27 23 22 18 17 13 12 7 creg z dst src2 src1 op 3 1 5 5 5 6 6 5 4 3 2 1 0 1 0 0 0 0 s p 1 1 1 0 Figure C−2. Extended .
PAGE 439
Appendix AppendixDA .L Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .L functional unit and illustrates the opcode maps for these instructions. Topic SPRU733 Page D.1 Instructions Executing in the .L Functional Unit . . . . . . . . . . . . . . . . . D-2 D.2 Opcode Map Symbols and Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3 D.3 32-Bit Opcode Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 440
Instructions Executing in the .L Functional Unit D.1 Instructions Executing in the .L Functional Unit Table D−1 lists the instructions that execute in the .L functional unit. Table D−1. Instructions Executing in the .L Functional Unit D-2 Instruction Instruction ABS LMBD ADD MV ADDDP NEG ADDSP NORM ADDU NOT AND OR CMPEQ SADD CMPGT SAT CMPGTU SPINT CMPLT SPTRUNC CMPLTU SSUB DPINT SUB DPSP SUBC DPTRUNC SUBDP INTDP SUBSP INTDPU SUBU INTSP XOR INTSPU ZERO .
PAGE 441
Opcode Map Symbols and Meanings D.2 Opcode Map Symbols and Meanings Table D−2 lists the symbols and meanings used in the opcode maps. Table D−2. .
PAGE 442
32-Bit Opcode Maps D.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .L unit are mapped in Figure D−1 through Figure D−3. Figure D−1. 1 or 2 Sources Instruction Format 31 29 28 27 23 22 18 17 13 12 11 5 creg z dst src2 src1 x op 3 1 5 5 5 1 7 4 3 2 1 0 1 1 0 s p 1 1 1 0 Figure D−2.
PAGE 443
Appendix AppendixEA .M Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .M functional unit and illustrates the opcode maps for these instructions. Topic SPRU733 Page E.1 Instructions Executing in the .M Functional Unit . . . . . . . . . . . . . . . . . E-2 E.2 Opcode Map Symbols and Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . E-3 E.3 32-Bit Opcode Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 444
Instructions Executing in the .M Functional Unit E.1 Instructions Executing in the .M Functional Unit Table E−1 lists the instructions that execute in the .M functional unit. Table E−1. Instructions Executing in the .M Functional Unit § E-2 Instruction Instruction MPY MPYLHU MPYDP MPYLSHU MPYH MPYLUHS MPYHL MPYSP MPYHLU MPYSPDP§ MPYHSLU MPYSP2DP§ MPYHSU MPYSU MPYHU MPYU MPYHULS MPYUS MPYHUS SMPY MPYI SMPYH MPYID SMPYHL MPYLH SMPYLH C67x+ DSP-specific instruction .
PAGE 445
Opcode Map Symbols and Meanings E.2 Opcode Map Symbols and Meanings Table E−2 lists the symbols and meanings used in the opcode maps. Table E−2. .
PAGE 446
32-Bit Opcode Maps E.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .M unit are mapped in Figure E−1 through Figure E−3. Figure E−1. Extended M-Unit with Compound Operations 31 29 28 27 23 22 18 17 13 12 11 0 creg z dst src2 src1 x 3 1 5 5 5 1 10 6 op 5 4 3 2 1 0 1 1 0 0 s p 5 1 1 1 0 Figure E−2. Extended .
PAGE 447
Appendix AppendixFA .S Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .S functional unit and illustrates the opcode maps for these instructions. Topic SPRU733 Page F.1 Instructions Executing in the .S Functional Unit . . . . . . . . . . . . . . . . . F-2 F.2 Opcode Map Symbols and Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . F-3 F.3 32-Bit Opcode Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 448
Instructions Executing in the .S Functional Unit F.1 Instructions Executing in the .S Functional Unit Table F−1 lists the instructions that execute in the .S functional unit. Table F−1. Instructions Executing in the .
PAGE 449
Opcode Map Symbols and Meanings F.2 Opcode Map Symbols and Meanings Table F−2 lists the symbols and meanings used in the opcode maps. Table F−2. .
PAGE 450
32-Bit Opcode Maps F.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .S unit are mapped in Figure F−1 through Figure F−11. Figure F−1. 1 or 2 Sources Instruction Format 31 29 28 27 23 22 18 17 13 12 11 6 creg z dst src2 src1 x op 3 1 5 5 5 1 6 5 4 3 2 1 0 1 0 0 0 s p 1 1 1 0 Figure F−2. Extended .
PAGE 451
32-Bit Opcode Maps Figure F−6. Call Unconditional, Immediate with Implied NOP 5 Instruction Format 31 0 0 29 28 0 z 27 cst21 7 1 21 6 5 4 3 2 1 0 0 0 1 0 0 s p 1 1 1 0 Figure F−7. Branch with NOP Constant Instruction Format 31 29 28 27 16 15 13 creg z src2 src1 3 1 12 3 12 11 0 0 6 5 4 3 2 0 0 1 0 0 1 0 0 0 s p 1 1 1 0 Figure F−8.
PAGE 452
Appendix AppendixG A No Unit Specified Instructions and Opcode Maps This appendix lists the instructions that execute with no unit specified and illustrates the opcode maps for these instructions. For a list of the instructions that execute in the .D functional unit, see Appendix C. For a list of the instructions that execute in the .L functional unit, see Appendix D. For a list of the instructions that execute in the .M functional unit, see Appendix E. For a list of the instructions that execute in the .
PAGE 453
Instructions Instructions Executing Executing With With No No Unit Unit Specified Specified / Opcode Map Symbols and Meanings G.1 Instructions Executing With No Unit Specified Table G−1 lists the instructions that execute with no unit specified. Table G−1. Instructions Executing With No Unit Specified Instruction IDLE NOP G.2 Opcode Map Symbols and Meanings Table G−2 lists the symbols and meanings used in the opcode maps. Table G−2.
PAGE 454
32-Bit Opcode Maps G.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the no unit instructions are mapped in Figure G−1 through Figure G−3. Figure G−1. Loop Buffer Instruction Format 31 29 28 27 23 22 18 creg z cstb csta 3 1 5 5 17 16 1 13 op 12 11 10 9 8 7 6 5 4 3 2 0 0 0 0 0 0 0 0 0 0 0 s p 4 1 0 1 1 1 0 Figure G−2.
PAGE 455
Index Index 1X and 2X paths 2-6 2-cycle DP instructions, .S-unit instruction constraints 4-36 4-cycle instructions .L-unit instruction constraints 4-49 .
PAGE 456
Index B B instruction using a displacement 3-69 using a register 3-71 B IRP instruction 3-73 B NRP instruction 3-75 B4 MODE bits 2-10 B5 MODE bits 2-10 B6 MODE bits 2-10 B7 MODE bits 2-10 bit field clear (CLR) 3-77 extract and sign-extend a bit field (EXT) 3-110 extract and zero-extend a bit field (EXTU) 3-113 set (SET) 3-210 bitwise AND (AND) 3-67 bitwise exclusive OR (XOR) 3-270 bitwise NOT (NOT) 3-194 bitwise OR (OR) 3-195 BK0 bits 2-10 BK1 bits 2-10 block diagram branch instructions 4-23 decode pipelin
PAGE 457
Index compare for equality floating-point double-precision values (CMPEQDP) 3-82 single-precision values (CMPEQSP) 3-84 signed integers (CMPEQ) 3-80 compare for greater than floating-point double-precision values (CMPGTDP) 3-89 single-precision values (CMPGTSP) 3-91 signed integers (CMPGT) 3-86 unsigned integers (CMPGTU) 3-93 compare for less than floating-point double-precision values (CMPLTDP) 3-98 single-precision values (CMPLTSP) 3-100 signed integers (CMPLT) 3-95 unsigned integers (CMPLTU) 3-102 condi
PAGE 458
Index cross paths CSR 2-13 2-6 D DA1 and DA2 2-7 data address paths 2-7 DC pipeline phase 4-3 DCC bits 2-13 decoding instructions 4-3 delay slots 3-14 DEN1 bit in FADCR 2-24 in FAUCR 2-27 in FMCR 2-31 DEN2 bit in FADCR 2-24 in FAUCR 2-27 in FMCR 2-31 detection and processing, interrupts 5-16 disabling an individual interrupt 5-13 disabling maskable interrupts globally 5-12 DIV0 bit 2-27 double-precision data format 3-9 DP compare instruction, pipeline operation 4-27 DP compare instructions, .
PAGE 459
Index IEn bit 2-17 IER 2-17 IFn bit 2-18 IFR 2-18 INEX bit in FADCR 2-24 in FAUCR 2-27 in FMCR 2-31 INFO bit in FADCR 2-24 in FAUCR 2-27 in FMCR 2-31 instruction compatibility 3-34, A-1 instruction descriptions 3-34 instruction execution .D unit C-2 .L unit D-2 .M unit E-2 .
PAGE 460
Index INTSPU instruction 3-122 INVAL bit in FADCR 2-24 in FAUCR 2-27 in FMCR 2-31 IRP 2-19 ISn bit 2-20 ISR 2-20 ISTB bits 2-21 ISTP 2-21 L latency 3-14 LDB instruction 5-bit unsigned constant offset or register offset 3-123 15-bit unsigned constant offset 3-126 LDBU instruction 5-bit unsigned constant offset or register offset 3-123 15-bit constant offset 3-126 LDDW instruction 3-128 constraints 3-29 LDDW instruction with long write instruction, D-unit instruction constraints 4-55 LDH instruction 5-bit u
PAGE 461
Index move 16-bit constant into upper bits of register (MVKH and MVKLH) 3-185 between control file and register file (MVC) 3-180 from register to register (MV) 3-178 signed constant into register and sign extend (MVK) 3-183 signed constant into register and sign extend (MVKL) 3-187 MPY instruction 3-143 MPYDP instruction 3-145 .
PAGE 462
Index multiply (continued) unsigned by unsigned unsigned 16 LSB by unsigned 16 LSB (MPYU) 3-174 unsigned 16 LSB by unsigned 16 MSB (MPYLHU) 3-163 unsigned 16 MSB by unsigned 16 LSB (MPYHLU) 3-151 unsigned 16 MSB by unsigned 16 MSB (MPYHU) 3-154 multiply instructions .
PAGE 463
Index PGIE bit 2-13 pipeline decode stage 4-3 execute stage 4-5 execution 4-12 factors that provide programming flexibility fetch stage 4-2 functional unit constraints 4-33 overview 4-2 performance considerations 4-56 phases 4-2 stages 4-2 summary 4-6 pipeline execution 4-1 4-12 pipeline operation ADDDP instruction 4-28 branch instructions 4-22 DP compare instruction 4-27 four-cycle instructions 4-25 INTDP instruction 4-26 load instructions 4-20 MPYDP instruction 4-31 MPYI instruction 4-29 MPYID instr
PAGE 464
Index returning from interrupt servicing REVISION ID bits 2-13 RMODE bits in FADCR 2-24 in FMCR 2-31 RSQRDP instruction 3-201 RSQRSP instruction 3-203 5-15 S SADD instruction 3-205 SAT bit 2-13 SAT instruction 3-208 saturate a 40-bit integer to a 32-bit integer (SAT) 3-208 serial fetch packets 3-17 set a bit field (SET) 3-210 set an individual interrupt 5-14 SET instruction 3-210 setting interrupts 5-14 setting the nonreset interrupt flag 5-16 setting the RESET interrupt flag 5-19 shift arithmetic shift
PAGE 465
Index SUBC instruction 3-258 SUBDP instruction 3-260 .L-unit instruction constraints 4-51 .S-unit instruction constraints 4-38 pipeline operation 4-28 SUBSP instruction 3-263 .