PNX1300 Datasheet, PDF(80/548 Page) NXP Semiconductors – Media Processors

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

PNX1300 Datasheet, PDF (80/548 Pages) NXP Semiconductors – Media Processors

◁

PNX1300/01/02/11 Data Book

Philips Semiconductors

void reconstruct (unsigned char *back,

unsigned char *forward,

char *idct,

unsigned char *destination)

{

int i;

int *i_back = (int *) back;

int *i_forward = (int *) forward;

int *i_idct = (int *) idct;

int *i_dest = (int *) destination;

for (i = 0; i < 16; i += 1)

i_dest[i] = DSPUQUADADDUI(QUADAVG(i_back[i], i_forward[i]), i_idct[i]);

}

Figure 4-8. Final version of the frame-reconstruction code.

unsigned char A[16][16];

unsigned char B[16][16];

.

.

.

for (row = 0; row < 16; row += 1)

{

for (col = 0; col < 16; col += 1)

cost += abs(A[row][col] â B[row][col]);

}

Figure 4-9. Match-cost loop for MPEG motion estimation.

unsigned char A[16][16];

unsigned char B[16][16];

.

.

.

for (row = 0; row < 16; row += 1)

{

for (col = 0; col < 16; col += 4)

{

cost += abs(A[row][col+0] â B[row][col+0]);

cost += abs(A[row][col+1] â B[row][col+1]);

cost += abs(A[row][col+2] â B[row][col+2]);

cost += abs(A[row][col+3] â B[row][col+3]);

Figure 4-10. Unrolled, but not parallel, version of the loop from Figure 4-9.

Figure 4-9 shows the original source code for the match-

cost loop. Unlike the previous example, the code is not a

self-contained function. Somewhere early in the code,

the arrays A[][] and B[][] are declared; somewhere be-

tween those declarations and the loop of interest, the ar-

rays are filled with data.

4.4.1 A Simple Transformation

First, we will look at the simplest way to use a PNX1300

custom operation.

We start by noticing that the computation in the loop of

Figure 4-9 involves the absolute value of the difference

of two unsigned characters (bytes). By now, we are fa-

miliar with the fact that PNX1300 includes a number of

operations that process all four bytes in a 32-bit word si-

multaneously. Since the match-cost calculation is funda-

mental to the MPEG algorithm, it is not surprising to find

a custom operationâume8uuâthat implements this op-

eration exactly.

To understand how ume8uu can be used in this case, we

need to transform the code as in the previous example.

Though the steps are presented here in detail, a pro-

grammer with a even a little experience can often per-

form these transformations by visual inspection.

To use a custom operation that processes 4 pixel values

simultaneously, we first need to create 4 parallel pixel

computations. Figure 4-10 shows the loop of Figure 4-9

unrolled by a factor of 4. Unfortunately, the code in the

unrolled loop is not parallel because each line depends

on the one above it. Figure 4-11 shows a more parallel

version of the code from Figure 4-10. By simply giving

each computation its own cost variable and then sum-

ming the costs all at once, each cost computation is com-

pletely independent.

4-8

PRELIMINARY SPECIFICATION

▷