AMD-K6 Datasheet, PDF(37/346 Page) Advanced Micro Devices – AMD-K6 Processor

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

AMD-K6 Datasheet, PDF (37/346 Pages) Advanced Micro Devices – AMD-K6 Processor

◁

20695H/0âMarch 1998

Preliminary Information

AMD-K6Â® Processor Data Sheet

2.7

Branch-Prediction Logic

Sophisticated branch logic that can minimize or hide the impact

of changes in program flow is designed into the AMD-K6

processor. Branches in x86 code fit into two categories â

unconditional branches, which always change program flow (that

is, the branches are always taken) and conditional branches,

which may or may not divert program flow (that is, the branches

are taken or not-taken). When a conditional branch is not taken,

the processor simply continues decoding and executing the next

instructions in memory.

Branch History Table

Branch Target Cache

Typical applications have up to 10% of unconditional branches

and another 10% to 20% conditional branches. The AMD-K6

branch logic has been designed to handle this type of program

behavior and its negative effects on instruction execution, such

as stalls due to delayed instruction fetching and the draining of

the processor pipeline. The branch logic contains an 8192-entry

branch history table, a 16-entry by 16-byte branch target cache,

a 16-entry return address stack, and a branch execution unit.

The AMD-K6 processor handles unconditional branches

without any penalty by redirecting instruction fetching to the

target address of the unconditional branch. However,

conditional branches require the use of the dynamic

branch-prediction mechanism built into the AMD-K6. A

two-level adaptive history algorithm is implemented in an

8192-entry branch history table. This table stores executed

branch information, predicts individual branches, and predicts

the behavior of groups of branches. To accommodate the large

branch history table, the AMD-K6 processor does not store

predicted target addresses. Instead, the branch target

addresses are calculated on-the-fly using ALUs during the

decode stage. The adders calculate all possible target addresses

before the instructions are fully decoded and the processor

chooses which addresses are valid.

To avoid a one clock cache-fetch penalty when a branch is

predicted taken, a built-in branch target cache supplies the first

16 bytes of instructions directly to the instruction buffer

(assuming the target address hits this cache). (See Figure 3 on

page 13.) The branch target cache is organized as 16 entries of

16 bytes. In total, the branch prediction logic achieves branch

prediction rates greater than 95%.

Chapter 2

Internal Architecture

19

▷