SA1110 Datasheet, PDF(60/406 Page) Intel Corporation – Intel StrongARM SA-1110 Microprocessor

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

SA1110 Datasheet, PDF (60/406 Pages) Intel Corporation – Intel StrongARM SA-1110 Microprocessor

◁

Caches, Write Buffer, and Read Buffer

6.3.2.2

6.3.2.3

Writes to a Bufferable and Noncacheable Location (B=1,C=0)

If the write buffer is enabled and the processor performs a write to a bufferable but noncacheable

location and misses in the Dcaches, the data is placed in the write buffer and the CPU continues

execution. As with the cacheable case, merging is allowed only on store multiples. The write buffer

performs the external write sometime later.

Unbufferable Writes (B=0)

If the write buffer is disabled or the CPU performs a write to an unbufferable area, the processor is

stalled until the write buffer empties and the write completes externally. This requires several

external clock cycles.

6.3.3 Enabling the Write Buffer

To enable the write buffer, ensure that the MMU is enabled by setting bit 0 in the control register,

then enable the write buffer by setting bit 3 in the control register. The MMU and write buffer can

be enabled simultaneously with a single write to the control register.

6.3.3.1

Disabling the Write Buffer

To disable the write buffer, clear bit 3 in the control register. Any writes already in the write buffer

will complete normally, but a drain write buffer needs to be done to force all writes out to memory.

Note: The write buffer is used to hold dirty copy-back cached lines from the data cache. It must be

enabled along with the data cache.

6.4

Read Buffer (RB)

The SA-1110 contains a software-programmable read buffer that can increase the performance of

critical loop code by prefetching data. The RB enables the preallocation of read-only data into one

of four 32-byte buffers without stalling the pipe. For subsequent loads that hit in the RB, data is

sourced from the buffer instead of the Dcaches at a rate of 1 word per core clock. Also, because

the programmer specifies which entry of the RB is used, critical data can be âlockedâ in to

eliminate bus latency.

The RB is controlled using coprocessor 15, register 9, and provides the capability to allocate 1

word, a half-line (4 words), or a full line (8 words) into one of four entries of the RB. (See

Chapter 5, âCoprocessorsâ for a detailed RB coprocessor description.) Half-line loads are

automatically aligned onto half-block boundaries (the lower four address bits are ignored).

Full-line loads are automatically aligned onto line boundaries (the lower five address bits are

ignored). For partial cache line RB loads, only the words actually fetched are marked valid and can

be sourced from the buffer. A small queue is used to ensure that subsequent RB load instructions go

out in order.

When an RB allocate instruction is executed, the virtual address is looked up in the TB to check for

a translation hit and possible access violations. If the access misses in the TB, the pipe is stalled

until the page is fetched through the normal hardware tablewalk mechanism. If an access violation

occurs, the RB load is NOPâd. For example, an RB allocate instruction can generate a data abort.

6-6

SA-1110 Developerâs Manual

▷