UPSD33XX Datasheet, PDF(19/231 Page) STMicroelectronics – Fast 8032 MCU with Programmable Logic

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

UPSD33XX Datasheet, PDF (19/231 Pages) STMicroelectronics – Fast 8032 MCU with Programmable Logic

◁

uPSD33xx

Pre-Fetch Queue (PFQ) and Branch Cache

(BC)

The PFQ is always working to minimize the idle

bus time inherent to 8032 MCU architecture, to

eliminate wasted memory fetches, and to maxi-

mize memory bandwidth to the MCU. The PFQ

does this by running asynchronously in relation to

the MCU, looking ahead to pre-fetch code from

program memory during any idle bus periods. Only

necessary bytes will be fetched (no dummy fetch-

es like standard 8032). The PFQ will queue up to

six code bytes in advance of execution, which sig-

nificantly optimizes sequential program perfor-

mance. However, when program execution

becomes non-sequential (program branch), a typ-

ical pre-fetch queue will empty itself and reload

new code, causing the MCU to stall. The Turbo

uPSD33xx diminishes this problem by using a

Branch Cache with the PFQ. The BC is a four-way,

fully associative cache, meaning that when a pro-

gram branch occurs, it's branch destination ad-

dress is compared simultaneously with four recent

previous branch destinations stored in the BC.

Each of the four cache entries contain up to six

bytes of code related to a branch. If there is a hit

(a match), then all six code bytes of the matching

program branch are transferred immediately and

simultaneously from the BC to the PFQ, and exe-

cution on that branch continues with minimal de-

lay. This greatly reduces the chance that the MCU

will stall from an empty PFQ, and improves perfor-

mance in embedded control systems where it is

quite common to branch and loop in relatively

small code localities.

By default, the PFQ and BC are enabled after

power-up or reset. The 8032 can disable the PFQ

and BC at runtime if desired by writing to a specific

SFR (BUSCON).

The memory in the PSD module operates with

variable wait states depending on the value spec-

ified in the SFR named BUSCON. For example, a

5V uPSD33xx device operating at a 40MHz crystal

frequency requires four memory wait states (equal

to four MCU clocks). In this example, once the

PFQ has one or more bytes of code, the wait

states become transparent and a full 10 MIPS is

achieved when the program stream consists of se-

quential one-byte, one machine-cycle instructions

as shown in Figure 7., page 18 (transparent be-

cause a machine-cycle is four MCU clocks which

equals the memory pre-fetch wait time that is also

four MCU clocks). But it is also important to under-

stand PFQ operation on multi-cycle instructions.

PFQ Example, Multi-cycle Instructions

Let us look at a string of two-byte, two-cycle in-

structions in Figure 9., page 20. There are three

instructions executed sequentially in this example,

instructions A, B, and C. Each of the time divisions

in the figure is one machine-cycle of four clocks,

and there are six phases to reference in this dis-

cussion. Each instruction is pre-fetched into the

PFQ in advance of execution by the MCU. Prior to

Phase 1, the PFQ has pre-fetched the two instruc-

tion bytes (A1 and A2) of instruction A. During

Phase one, both bytes are loaded into the MCU

execution unit. Also in Phase 1, the PFQ is pre-

fetching the first byte (B1) of instruction B from

program memory. In Phase 2, the MCU is pro-

cessing Instruction A internally while the PFQ is

pre-fetching the second byte (B2) of Instruction B.

In Phase 3, both bytes of instruction B are loaded

into the MCU execution unit and the PFQ begins

to pre-fetch bytes for the third instruction C. In

Phase 4 Instruction B is processed and the pre-

fetching continues, eliminating idle bus cycles and

feeding a continuous flow of operands and op-

codes to the MCU execution unit.

The uPSD33xx MCU instructions are an exact 1/3

scale of all standard 8032 instructions with regard

to number of cycles per instruction. Figure

10., page 20 shows the equivalent instruction se-

quence from the example above on a standard

8032 for comparison.

Aggregate Performance

The stream of two-byte, two-cycle instructions in

Figure 9., page 20, running on a 40MHz, 5V,

uPSD33xx will yield 5 MIPs. And we saw the

stream of one-byte, one-cycle instructions in Fig-

ure 7., page 18, on the same MCU yield 10 MIPs.

Effective performance will depend on a number of

things: the MCU clock frequency; the mixture of in-

structions types (bytes and cycles) in the applica-

tion; the amount of time an empty PFQ stalls the

MCU (mix of instruction types and misses on

Branch Cache); and the operating voltage. A 5V

uPSD33xx device operates with four memory wait

states, but a 3.3V device operates with five mem-

ory wait states yielding 8 MIPS peak compared to

10 MIPs peak for 5V device. The same number of

wait states will apply to both program fetches and

to data READ/WRITEs unless otherwise specified

in the SFR named BUSCON.

In general, a 3X aggregate performance increase

is expected over any standard 8032 application

running at the same clock frequency.

19/231

▷