English
Language : 

DS693 Datasheet, PDF (10/13 Pages) Xilinx, Inc – Integrated into Xilinx Embedded Development Kit
LogiCORE IP Virtex-5 APU Floating-Point Unit (v1.01a)
By making the loop code independent of the integer loop counter, all code inside the loop is carried out using the
FPU. The compiler is not at liberty to perform this optimization in general, as the two code fragments above may
give different results in some cases (for example, very large t).
Runtime Library Functions
The standard C runtime math library functions operate using double-precision arithmetic. When using a
single-precision FPU, calls to certain functions (such as fabs() and sqrt()) result in inefficient emulation routines
being used instead of FPU instructions:
float x=-1.0F;
…
x = fabs(x); /* uses double precision */
x = sqrt(x); /* uses double precision */
When used with single-precision data types, the result is a cast to double, a runtime library call (which cannot use
the FPU) and then a truncation back to float.
The solution is to use the non-ANSI standard functions fabsf() and sqrtf(x) instead, which operate using
single precision and can be carried out using the FPU. For example:
float x=-1.0F;
…
x = fabsf(x); /* uses single precision */
x = sqrtf(x); /* uses single precision */
Array Accesses and Pointer Ambiguity
It is difficult for the compiler to detect when two memory references (such as array element accesses) refer to the
same location or not. The expected behavior is for the compiler to treat almost all array and pointer accesses as if
they conflict. For example, the following code forms the inner loop of a simple Cooley-Tukey FFT algorithm
implementation:
tr = ar0*Real[k] - ai0*Imag[k];
ti = ar0*Imag[k] + ai0*Real[k];
Real[k] = Real[j] - tr; /* A */
Imag[k] = Imag[j] - ti;
Real[j] += tr;
/* B */
Imag[j] += ti;
Because the compiler does not know that Real[k] and Real[j] are never the same element, the addition in
statement B cannot start until the addition in statement A is finished. This spurious dependency limits the amount
of parallelism and slows down the computation. One possible solution is to introduce some temporary variables,
and separate the memory accesses from the mathematics, like this:
r_k = Real[k]; i_k = Imag[k];
r_j = Real[j]; i_j = Imag[j];
tr = ar0*r_k - ai0*i_k;
ti = ar0*i_k + ai0*r_k;
r_k = r_j - tr;
i_k = i_j - ti;
r_j += tr;
i_j += ti;
Real[j] = r_j; Real[k] = r_k;
Imag[j] = i_j; Imag[k] = i_k;
While this code is less concise, it gives much better results.
DS693 March 1, 2011
www.xilinx.com
10
Product Specification