DS693 Datasheet, PDF(10/13 Page) Xilinx, Inc – Integrated into Xilinx Embedded Development Kit

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

DS693 Datasheet, PDF (10/13 Pages) Xilinx, Inc – Integrated into Xilinx Embedded Development Kit

◁

LogiCORE IP Virtex-5 APU Floating-Point Unit (v1.01a)

By making the loop code independent of the integer loop counter, all code inside the loop is carried out using the

FPU. The compiler is not at liberty to perform this optimization in general, as the two code fragments above may

give different results in some cases (for example, very large t).

Runtime Library Functions

The standard C runtime math library functions operate using double-precision arithmetic. When using a

single-precision FPU, calls to certain functions (such as fabs() and sqrt()) result in inefficient emulation routines

being used instead of FPU instructions:

float x=-1.0F;

â¦

x = fabs(x); /* uses double precision */

x = sqrt(x); /* uses double precision */

When used with single-precision data types, the result is a cast to double, a runtime library call (which cannot use

the FPU) and then a truncation back to float.

The solution is to use the non-ANSI standard functions fabsf() and sqrtf(x) instead, which operate using

single precision and can be carried out using the FPU. For example:

float x=-1.0F;

â¦

x = fabsf(x); /* uses single precision */

x = sqrtf(x); /* uses single precision */

Array Accesses and Pointer Ambiguity

It is difficult for the compiler to detect when two memory references (such as array element accesses) refer to the

same location or not. The expected behavior is for the compiler to treat almost all array and pointer accesses as if

they conflict. For example, the following code forms the inner loop of a simple Cooley-Tukey FFT algorithm

implementation:

tr = ar0*Real[k] - ai0*Imag[k];

ti = ar0*Imag[k] + ai0*Real[k];

Real[k] = Real[j] - tr; /* A */

Imag[k] = Imag[j] - ti;

Real[j] += tr;

/* B */

Imag[j] += ti;

Because the compiler does not know that Real[k] and Real[j] are never the same element, the addition in

statement B cannot start until the addition in statement A is finished. This spurious dependency limits the amount

of parallelism and slows down the computation. One possible solution is to introduce some temporary variables,

and separate the memory accesses from the mathematics, like this:

r_k = Real[k]; i_k = Imag[k];

r_j = Real[j]; i_j = Imag[j];

tr = ar0*r_k - ai0*i_k;

ti = ar0*i_k + ai0*r_k;

r_k = r_j - tr;

i_k = i_j - ti;

r_j += tr;

i_j += ti;

Real[j] = r_j; Real[k] = r_k;

Imag[j] = i_j; Imag[k] = i_k;

While this code is less concise, it gives much better results.

DS693 March 1, 2011

www.xilinx.com

Product Specification

▷