English
Language : 

PNX1300 Datasheet, PDF (80/548 Pages) NXP Semiconductors – Media Processors
PNX1300/01/02/11 Data Book
Philips Semiconductors
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i;
int *i_back = (int *) back;
int *i_forward = (int *) forward;
int *i_idct = (int *) idct;
int *i_dest = (int *) destination;
for (i = 0; i < 16; i += 1)
i_dest[i] = DSPUQUADADDUI(QUADAVG(i_back[i], i_forward[i]), i_idct[i]);
}
Figure 4-8. Final version of the frame-reconstruction code.
unsigned char A[16][16];
unsigned char B[16][16];
.
.
.
for (row = 0; row < 16; row += 1)
{
for (col = 0; col < 16; col += 1)
cost += abs(A[row][col] – B[row][col]);
}
Figure 4-9. Match-cost loop for MPEG motion estimation.
unsigned char A[16][16];
unsigned char B[16][16];
.
.
.
for (row = 0; row < 16; row += 1)
{
for (col = 0; col < 16; col += 4)
{
cost += abs(A[row][col+0] – B[row][col+0]);
cost += abs(A[row][col+1] – B[row][col+1]);
cost += abs(A[row][col+2] – B[row][col+2]);
cost += abs(A[row][col+3] – B[row][col+3]);
Figure 4-10. Unrolled, but not parallel, version of the loop from Figure 4-9.
Figure 4-9 shows the original source code for the match-
cost loop. Unlike the previous example, the code is not a
self-contained function. Somewhere early in the code,
the arrays A[][] and B[][] are declared; somewhere be-
tween those declarations and the loop of interest, the ar-
rays are filled with data.
4.4.1 A Simple Transformation
First, we will look at the simplest way to use a PNX1300
custom operation.
We start by noticing that the computation in the loop of
Figure 4-9 involves the absolute value of the difference
of two unsigned characters (bytes). By now, we are fa-
miliar with the fact that PNX1300 includes a number of
operations that process all four bytes in a 32-bit word si-
multaneously. Since the match-cost calculation is funda-
mental to the MPEG algorithm, it is not surprising to find
a custom operation—ume8uu—that implements this op-
eration exactly.
To understand how ume8uu can be used in this case, we
need to transform the code as in the previous example.
Though the steps are presented here in detail, a pro-
grammer with a even a little experience can often per-
form these transformations by visual inspection.
To use a custom operation that processes 4 pixel values
simultaneously, we first need to create 4 parallel pixel
computations. Figure 4-10 shows the loop of Figure 4-9
unrolled by a factor of 4. Unfortunately, the code in the
unrolled loop is not parallel because each line depends
on the one above it. Figure 4-11 shows a more parallel
version of the code from Figure 4-10. By simply giving
each computation its own cost variable and then sum-
ming the costs all at once, each cost computation is com-
pletely independent.
4-8
PRELIMINARY SPECIFICATION