TMS320DM647_10 Datasheet, PDF(8/181 Page) Texas Instruments

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

TMS320DM647_10 Datasheet, PDF (8/181 Pages) Texas Instruments – Digital Media Processor

◁

TMS320DM647

TMS320DM648

SPRS372F â JANUARY 2010 â REVISED SEPTEMBER 2009

www.ti.com

Table 2-1. Characteristics of the Processor (continued)

HARDWARE FEATURES

PLL Options

CLKIN1 frequency multiplier

BGA Package

Process

Technology

Product Status(2)

0.09-mm/6-Level Cu Metal

Process (CMOS)

Production Data (PD)

DM647

x1 (Bypass),

PLLM = 15, 16, â¦, 31

(x16, x17, â¦, x32)(1)

529-Pin Flip Chip Plastic BGA (ZUT)

0.09 mm

DM648

x1 (Bypass),

PLLM = 15, 16, â¦, 31

(x16, x17, â¦, x32)(1)

529-Pin Flip Chip Plastic BGA (ZUT)

0.09 mm

(1) The maximum CPU frequency must not be violated.

(2) See Section 2.7 for a description of each stage of development.

2.2 CPU (DSP Core) Description

The C64x+ central processing unit (CPU) consists of eight functional units, two register files, and two data

paths as shown in Figure 2-1. The two general-purpose register files (A and B) each contain 32 32-bit

registers for a total of 64 registers. The general-purpose registers can be used for data or can be data

address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit data,

40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are stored

in register pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or 32 MSBs in

the next upper register (which is always an odd-numbered register).

The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one

instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units

perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from

memory to the register file and store results from the register file into memory.

The C64x+ CPU extends the performance of the C64x core through enhancements and new features.

Each C64x+ .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x

32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with

add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four

16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for

Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and

modems require complex multiplication. The complex multiply (CMPY) instruction takes for 16-bit inputs

and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding

capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The

32 x 32 bit multiply instructions provide the extended precision necessary for audio and other

high-precision algorithms on a variety of signed and unsigned 32-bit data types.

The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a

pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data

performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.

The C64x+ core enhances the .S unit in several ways. In the C64x core, dual 16-bit MIN2 and MAX2

comparisons were available only on the .L units. On the C64x+ core they are also available on the .S unit,

which increases the performance of algorithms that do searching and sorting. Finally, to increase data

packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit

and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack

instructions return parallel results to output precision including saturation support.

Other new features include:

â¢ SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where

multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size

associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.

Device Overview

Submit Documentation Feedback

Product Folder Link(s): TMS320DM647 TMS320DM648

▷