English
Language : 

AMD-K6 Datasheet, PDF (37/346 Pages) Advanced Micro Devices – AMD-K6 Processor
20695H/0—March 1998
Preliminary Information
AMD-K6® Processor Data Sheet
2.7
Branch-Prediction Logic
Sophisticated branch logic that can minimize or hide the impact
of changes in program flow is designed into the AMD-K6
processor. Branches in x86 code fit into two categories —
unconditional branches, which always change program flow (that
is, the branches are always taken) and conditional branches,
which may or may not divert program flow (that is, the branches
are taken or not-taken). When a conditional branch is not taken,
the processor simply continues decoding and executing the next
instructions in memory.
Branch History Table
Branch Target Cache
Typical applications have up to 10% of unconditional branches
and another 10% to 20% conditional branches. The AMD-K6
branch logic has been designed to handle this type of program
behavior and its negative effects on instruction execution, such
as stalls due to delayed instruction fetching and the draining of
the processor pipeline. The branch logic contains an 8192-entry
branch history table, a 16-entry by 16-byte branch target cache,
a 16-entry return address stack, and a branch execution unit.
The AMD-K6 processor handles unconditional branches
without any penalty by redirecting instruction fetching to the
target address of the unconditional branch. However,
conditional branches require the use of the dynamic
branch-prediction mechanism built into the AMD-K6. A
two-level adaptive history algorithm is implemented in an
8192-entry branch history table. This table stores executed
branch information, predicts individual branches, and predicts
the behavior of groups of branches. To accommodate the large
branch history table, the AMD-K6 processor does not store
predicted target addresses. Instead, the branch target
addresses are calculated on-the-fly using ALUs during the
decode stage. The adders calculate all possible target addresses
before the instructions are fully decoded and the processor
chooses which addresses are valid.
To avoid a one clock cache-fetch penalty when a branch is
predicted taken, a built-in branch target cache supplies the first
16 bytes of instructions directly to the instruction buffer
(assuming the target address hits this cache). (See Figure 3 on
page 13.) The branch target cache is organized as 16 entries of
16 bytes. In total, the branch prediction logic achieves branch
prediction rates greater than 95%.
Chapter 2
Internal Architecture
19