AMD-K6-2E Datasheet, PDF(39/332 Page) Advanced Micro Devices – AMD-K6™-2E Embedded Processor

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

AMD-K6-2E Datasheet, PDF (39/332 Pages) Advanced Micro Devices – AMD-K6™-2E Embedded Processor

◁

22529B/0âJanuary 2000

Preliminary Information

AMD-K6â¢-2E Processor Data Sheet

Branch History Table

Branch Target Cache

Return Address Stack

Typical applications have up to 10% of unconditional branches

and another 10% to 20% conditional branches. The AMD-K6-2E

processor branch logic has been designed to handle this type of

program behavior and its negative effects on instruction

execution, such as stalls due to delayed instruction fetching and

the draining of the processor pipeline. The branch logic

contains an 8192-entry branch history table, a 16-entry by

16-byte branch target cache, a 16-entry return address stack,

and a branch execution unit.

The AMD-K6-2E processor handles unconditional branches

without any penalty by redirecting instruction fetching to the

target address of the unconditional branch. However,

conditional branches require the use of the dynamic

branch-prediction mechanism built into the AMD-K6-2E

processor.

A two-level adaptive history algorithm is implemented in an

8192-entry branch history table. This table stores executed

branch information, predicts individual branches, and predicts

the behavior of groups of branches.

To accommodate the large branch history table, the AMD-K6-2E

processor does not store predicted target addresses. Instead,

the branch target addresses are calculated on-the-fly using

ALUs during the decode stage. The adders calculate all

possible target addresses before the instructions are fully

decoded, and the processor chooses which addresses are valid.

To avoid a one clock cache-fetch penalty when a branch is

predicted taken, a built-in branch target cache supplies the first

16 bytes of instructions directly to the instruction buffer

(assuming the target address hits this cache). (See Figure 3 on

page 14.)

The branch target cache is organized as 16 entries of 16 bytes.

In total, the branch prediction logic achieves branch prediction

rates greater than 95%.

The return address stack is a special device designed to

optimize CALL and RET pairs. Software is typically compiled

with subroutines that are frequently called from various places

in a program. This is usually done to save space.

Chapter 2

Internal Architecture

21

▷