English
Language : 

UPSD3422_06 Datasheet, PDF (24/293 Pages) STMicroelectronics – Turbo Plus Series Fast Turbo 8032 MCU with USB and Programmable Logic
8032 MCU core performance enhancements
uPSD34xx
Figure 8.
Instruction Pre-Fetch Queue and Branch Cache
Branch
Cache
(BC)
Branch 4 Branch 4
Code
Code
Branch 3 Branch 3
Code
Code
Branch 2 Branch 2
Code
Code
Branch 1 Branch 1
Code
Code
Compare
Load on Branch Address Match
16
16
16-bit
Program
Memory
on PSD
Module
Instruction Byte
8
Instruction Byte
8
Address
16
Wait
4 Bytes of Instruction
Instruction Pre-Fetch Queue (PFQ)
Instruction Byte
8
Address
16
Wait
Current
Branch
Address
8032
MCU
AI10431
5.1
Pre-fetch queue (PFQ) and branch cache (BC)
The PFQ is always working to minimize the idle bus time inherent to 8032 MCU architecture,
to eliminate wasted memory fetches, and to maximize memory bandwidth to the MCU. The
PFQ does this by running asynchronously in relation to the MCU, looking ahead to pre-fetch
two bytes (word) of code from program memory during any idle bus periods. Only necessary
word will be fetched (no dummy fetches like standard 8032). The PFQ will queue up to four
code bytes in advance of execution, which significantly optimizes sequential program
performance. However, when program execution becomes non-sequential (program
branch), a typical pre-fetch queue will empty itself and reload new code, causing the MCU to
stall. The Turbo uPSD34xx diminishes this problem by using a Branch Cache with the PFQ.
The BC is a four-way, fully associative cache, meaning that when a program branch occurs,
its branch destination address is compared simultaneously with four recent previous branch
destinations stored in the BC. Each of the four cache entries contain up to four bytes of code
related to a branch. If there is a hit (a match), then all four code bytes of the matching
program branch are transferred immediately and simultaneously from the BC to the PFQ,
and execution on that branch continues with minimal delay. This greatly reduces the chance
that the MCU will stall from an empty PFQ, and improves performance in embedded control
systems where it is quite common to branch and loop in relatively small code localities.
By default, the PFQ and BC are enabled after power-up or reset. The 8032 can disable the
PFQ and BC at runtime if desired by writing to a specific SFR (BUSCON).
The memory in the PSD module operates with variable wait states depending on the value
specified in the SFR named BUSCON. For example, a 5V uPSD34xx device operating at a
40MHz crystal frequency requires four memory wait states (equal to four MCU clocks). In
this example, once the PFQ has one word of code, the wait states become transparent and
a full 10 MIPS is achieved when the program stream consists of sequential one- or two-byte,
one machine-cycle instructions as shown in Figure 7 on page 23 (transparent because a
machine-cycle is four MCU clocks which equals the memory pre-fetch wait time that is also
24/293