DSP16411 Datasheet, PDF(19/316 Page) Agere Systems – DSP16411 Digital Signal Processor

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

DSP16411 Datasheet, PDF (19/316 Pages) Agere Systems – DSP16411 Digital Signal Processor

◁

Data Sheet

May 2003

DSP16411 Digital Signal Processor

4 Hardware Architecture (continued)

4.2 DSP16000 Core Architectural Overview

rent state of the cache. The 32-bit csave register holds

the opcode of the instruction following the loop instruc-

tion in program memory.

The DSP16411 contains two identical DSP16000

cores. As shown in Figure 2 on page 21, each core

consists of four major blocks: system control and cache

(SYS), data arithmetic unit (DAU), Y-memory space

address arithmetic unit (YAAU), and X-memory space

address arithmetic unit (XAAU). Bits within the auc0

and auc1 registers configure the DAU mode-controlled

operations. See the DSP16000 Digital Signal Proces-

sor Core Information Manual for a complete description

of the DSP16000 core.

4.2.1 System Control and Cache (SYS)

This section consists of the control block and the

cache.

The control block provides overall system coordination

that is mostly invisible to the user. The control block

includes an instruction decoder and sequencer, a

pseudorandom sequence generator (PSG), an inter-

rupt and trap handler, a wait-state generator, and low-

power standby mode control logic. An interrupt and trap

handler provides a user-locatable vector table and

three levels of user-assigned interrupt priority.

SYS contains the alf register, which is a 16-bit register

that contains AWAIT, a power-saving standby mode

bit, and peripheral flags. The inc0 and inc1 registers

are 20-bit interrupt control registers, and ins is a 20-bit

interrupt status register.

Programs use the instruction cache to store and exe-

cute repetitive operations such as those found in an

FIR or IIR filter section. The cache can contain up to

thirty-one 16-bit and 32-bit instructions. The code in the

cache can repeat up to 216 â 1 times without looping

overhead. Operations in the cache that require a coeffi-

cient access execute at twice the normal rate because

the XAAU and its associated bus are not needed for

fetching instructions. The cache greatly reduces the

need for writing in-line repetitive code and, therefore,

reduces instruction/coefficient memory size require-

ments. In addition, the use of cache reduces power

consumption because it eliminates memory accesses

for instruction fetches.

The cache provides a convenient, low-overhead loop-

ing structure that is interruptible, savable, and restor-

able. The cache is addressable in both the X and Y

memory spaces. An interrupt or trap handling routine

can save and restore cloop, cstate, csave, and the

contents of the cache. The cloop register controls the

cache loop count. The cstate register contains the cur-

4.2.2 Data Arithmetic Unit (DAU)

The DAU is a power-efficient, dual-MAC (multiply/accu-

mulate), parallel-pipelined structure that is tailored to

communications applications. It can perform two dou-

ble-word (32-bit) fetches, two multiplications, and two

accumulations in a single instruction cycle. The dual-

MAC parallel pipeline begins with two 32-bit registers,

x and y. The pipeline treats the 32-bit registers as four

16-bit signed registers if used as input to two signed

16-bit x 16-bit multipliers. Each multiplier produces a

full 32-bit result stored into registers p0 and p1. The

DAU can direct the output of each multiplier to a 40-bit

ALU or a 40-bit 3-input ADDER. The ALU and ADDER

results are each stored in one of eight 40-bit accumula-

tors, a0 through a7. Both the ALU and ADDER include

an ACS (add/compare/select) function for Viterbi

decoding. The DAU can direct the output of each accu-

mulator to the ALU/ACS, the ADDER/ACS, or a 40-bit

BMU (bit manipulation unit).

The ALU implements 2-input addition, subtraction, and

various logical operations. The ADDER implements

2-input or 3-input addition and subtraction. To support

Viterbi decoding, the ALU and ADDER have a split

mode in which two simultaneous 16-bit additions or

subtractions are performed. This mode, available in

specialized dual-MAC instructions, is used to compute

the distance between a received symbol and its esti-

mate.

The ACS provides the add/compare/select function

required for Viterbi decoding. This unit provides flags to

the traceback encoder for implementing mode-con-

trolled side-effects for ACS operations. The source

operands for the ACS are any two accumulators, and

results are written back to one of the source accumula-

tors.

The BMU implements barrel-shift, bit-field insertion, bit-

field extraction, exponent extraction, normalization, and

accumulator shuffling operations. ar0 through ar3 are

auxiliary registers whose main function is to control

BMU operations.

The user can enable overflow saturation to affect the

multiplier output and the results of the three arithmetic

units. Overflow saturation can also affect an accumula-

tor value as it is transferred to memory or other

register. These features accommodate various speech

coding standards such as GSM-FR, GSM-HR, and

GSM-EFR. Shifting in the arithmetic pipeline occurs at

several stages to accommodate various standards for

mixed-precision and double-precision multiplications.

Agere Systems Inc.

Agere SystemsâProprietary

19

Use pursuant to Company instructions

▷