English
Language : 

AMD-K6-2E Datasheet, PDF (39/332 Pages) Advanced Micro Devices – AMD-K6™-2E Embedded Processor
22529B/0—January 2000
Preliminary Information
AMD-K6™-2E Processor Data Sheet
Branch History Table
Branch Target Cache
Return Address Stack
Typical applications have up to 10% of unconditional branches
and another 10% to 20% conditional branches. The AMD-K6-2E
processor branch logic has been designed to handle this type of
program behavior and its negative effects on instruction
execution, such as stalls due to delayed instruction fetching and
the draining of the processor pipeline. The branch logic
contains an 8192-entry branch history table, a 16-entry by
16-byte branch target cache, a 16-entry return address stack,
and a branch execution unit.
The AMD-K6-2E processor handles unconditional branches
without any penalty by redirecting instruction fetching to the
target address of the unconditional branch. However,
conditional branches require the use of the dynamic
branch-prediction mechanism built into the AMD-K6-2E
processor.
A two-level adaptive history algorithm is implemented in an
8192-entry branch history table. This table stores executed
branch information, predicts individual branches, and predicts
the behavior of groups of branches.
To accommodate the large branch history table, the AMD-K6-2E
processor does not store predicted target addresses. Instead,
the branch target addresses are calculated on-the-fly using
ALUs during the decode stage. The adders calculate all
possible target addresses before the instructions are fully
decoded, and the processor chooses which addresses are valid.
To avoid a one clock cache-fetch penalty when a branch is
predicted taken, a built-in branch target cache supplies the first
16 bytes of instructions directly to the instruction buffer
(assuming the target address hits this cache). (See Figure 3 on
page 14.)
The branch target cache is organized as 16 entries of 16 bytes.
In total, the branch prediction logic achieves branch prediction
rates greater than 95%.
The return address stack is a special device designed to
optimize CALL and RET pairs. Software is typically compiled
with subroutines that are frequently called from various places
in a program. This is usually done to save space.
Chapter 2
Internal Architecture
21