English
Language : 

1N4007 Datasheet, PDF (156/236 Pages) Naina Semiconductor ltd. – General Purpose Rectifier 1.0A
TMS320C6652, TMS320C6654
SPRS841D – MARCH 2012 – REVISED JUNE 2016
www.ti.com
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
Each C66x .M unit can perform one of the following fixed-point operations each clock cycle: four 32 × 32
bit multiplies, sixteen 16 × 16 bit multiplies, four 16 × 32 bit multiplies, four 8 × 8 bit multiplies, four 8 × 8
bit multiplies with add operations, and four 16 × 16 multiplies with add/subtract capabilities. There is also
support for Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as
FFTs and modems require complex multiplication. Each C66x .M unit can perform one 16 × 16 bit
complex multiply with or without rounding capabilities, two 16 × 16 bit complex multiplies with rounding
capability, and a 32 × 32 bit complex multiply with rounding capability. The C66x can also perform two 16
× 16 bit and one 32 × 32 bit complex multiply instructions that multiply a complex number with a complex
conjugate of another number with rounding capability. Communication signal processing also requires an
extensive use of matrix operations. Each C66x .M unit is capable of multiplying a [1 × 2] complex vector
by a [2 × 2] complex matrix per cycle with or without rounding capability. A version also exists allowing
multiplication of the conjugate of a [1 × 2] vector with a [2 × 2] complex matrix.
Each C66x .M unit also includes IEEE floating-point multiplication operations from the C674x DSP, which
includes one single-precision multiply each cycle and one double-precision multiply every 4 cycles. There
is also a mixed-precision multiply that allows multiplication of a single-precision value by a double-
precision value and an operation allowing multiplication of two single-precision numbers resulting in a
double-precision number. The C66x DSP improves the performance over the C674x double-precision
multiplies by adding a instruction allowing one double-precision multiply per cycle and also reduces the
number of delay slots from 10 down to 4. Each C66x .M unit can also perform one the following floating-
point operations each clock cycle: one, two, or four single-precision multiplies or a complex single-
precision multiply.
The .L and .S units can now support up to 64-bit operands. This allows for new versions of many of the
arithmetic, logical, and data packing instructions to allow for more parallel operations per cycle. Additional
instructions were added yielding performance enhancements of the floating point addition and subtraction
instructions, including the ability to perform one double precision addition or subtraction per cycle.
Conversion to/from integer and single-precision values can now be done on both .L and .S units on the
C66x. Also, by taking advantage of the larger operands, instructions were also added to double the
number of these conversions that can be done. The .L unit also has additional instructions for logical AND
and OR instructions, as well as, 90 degree or 270 degree rotation of complex numbers (up to two per
cycle). Instructions have also been added that allow for the computing the conjugate of a complex
number.
The MFENCE instruction is a new instruction introduced on the C66x DSP. This instruction will create a
DSP stall until the completion of all the DSP-triggered memory transactions, including:
• Cache line fills
• Writes from L1D to L2 or from the CorePac to MSMC and/or other system endpoints
• Victim write backs
• Block or global coherence operations
• Cache mode changes
• Outstanding XMC prefetch requests
This is useful as a simple mechanism for programs to wait for these requests to reach their endpoint. It
also ensures ordering for writes arriving at a single endpoint through multiple paths, multiprocessor
algorithms that depend on ordering, and manual coherence operations.
For more details on the C66x DSP and its enhancements over the C64x+ and C674x architectures, see
the following documents:
• C66x CPU and Instruction Set Reference Guide
• C66x DSP Cache User's Guide
156 Detailed Description
Copyright © 2012–2016, Texas Instruments Incorporated
Submit Documentation Feedback
Product Folder Links: TMS320C6652 TMS320C6654