80960KA Datasheet, PDF(8/43 Page) Intel Corporation

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

80960KA Datasheet, PDF (8/43 Pages) Intel Corporation – EMBEDDED 32-BIT MICROPROCESSOR

◁

80960KA

purpose registers provided in other popular micropro-

cessors. The term global refers to the fact that these

registers retain their contents across procedure calls.

The local registers, on the other hand, are procedure

specific. For each procedure call, the 80960KA

allocates 16 local registers (R0 through R15). Each

local register is 32 bits wide.

1.1.4. Multiple Register Sets

To further increase the efficiency of the register set,

multiple sets of local registers are stored on-chip (See

Figure 4). This cache holds up to four local register

frames, which means that up to three procedure calls

can be made without having to access the procedure

stack resident in memory.

Although programs may have procedure calls nested

many calls deep, a program typically oscillates back

and forth between only two to three levels. As a

result, with four stack frames in the cache, the proba-

bility of having a free frame available on the cache

when a call is made is very high. In fact, runs of repre-

sentative C-language programs show that 80% of the

calls are handled without needing to access memory.

If four or more procedures are active and a new

procedure is called, the 80960KA moves the oldest

local register set in the stack-frame cache to a

procedure stack in memory to make room for a new

set of registers. Global register G15 is the frame

pointer (FP) to the procedure stack.

Global registers are not exchanged on a procedure

call, but retain their contents, making them available

to all procedures for fast parameter passing.

1.1.5. Instruction Cache

To further reduce memory accesses, the 80960KA

includes a 512-byte on-chip instruction cache. The

instruction cache is based on the concept of locality

of reference; most programs are not usually executed

in a steady stream but consist of many branches,

loops and procedure calls that lead to jumping back

and forth in the same small section of code. Thus, by

maintaining a block of instructions in cache, the

number of memory references required to read

instructions into the processor is greatly reduced.

To load the instruction cache, instructions are fetched

in 16-byte blocks; up to four instructions can be

fetched at one time. An efficient prefetch algorithm

increases the probability that an instruction will

already be in the cache when it is needed.

Code for small loops often fits entirely within the

cache, leading to a great increase in processing

speed since further memory references might not be

necessary until the program exits the loop. Similarly,

when calling short procedures, the code for the

calling procedure is likely to remain in the cache so it

will be there on the procedureâs return.

1.1.6. Register Scoreboarding

The instruction decoder is optimized in several ways.

One optimization method is the ability to overlap

instructions by using register scoreboarding.

a variable from memory into a register. When the

instruction initiates, a scoreboard bit on the target

reset. In between, any reference to the register

contents is accompanied by a test of the scoreboard

bit to ensure that the load has completed before

processing continues. Since the processor does not

need to wait for the LOAD to complete, it can execute

additional instructions placed between the LOAD and

the instruction that uses the register contents, as

shown in the following example:

ld data_2, r4

ld data_2, r5

Unrelated instruction

add R4, R5, R6

In essence, the two unrelated instructions between

LOAD and ADD are executed âfor freeâ (i.e., take no

apparent time to execute) because they are executed

while the register is being loaded. Up to three load

instructions can be pending at one time with three

corresponding scoreboard bits set. By exploiting this

feature, system programmers and compiler writers

have a useful tool for optimizing execution speed.

▷