Rounding Errors

Although the rounding error for one real number might be acceptably small in your calculations, at least two problems arise because of it. If you test for exact equality between what you consider to be two exact numbers, the rounding error of either or both floating-point representations of those numbers may prevent a successful comparison and produce spurious results. Also, when you calculate with floating-point numbers the rounding errors may accumulate to a meaningful loss of numerical significance.

Carefully consider the numerics of your solution to minimize rounding errors or their effects. You might benefit from using double-precision arithmetic or restructuring your algorithm, or both. For instance, if your calculations involve arrays of linear data items, you might reduce the loss of numerical significance by subtracting the mean value of each array from each array element and by normalizing each element of such an array to the standard deviation of the array elements.

The following code segment can execute differently on different systems and produce different results for n, x, and s. It also produces different results if you use the /fltconsistency (ia32 systems only) or /nofltconsistency compiler options. Rounding error accumulates in x because the floating-point representation of 0.2 is inexact, then accumulates in s, and affects the final value for n:

      INTEGER n
      REAL s, x
      n = 0
      s = 0.
      x = 0.
    1 n = n + 1
      x = x + 0.2
      s = s + x
      IF ( x .LE. 10. ) GOTO 1 ! Will you get 51 cycles?
      WRITE(*,*) 'n = ', n, '; x = ', x, '; s = ', s

This example illustrates a common coding problem: carrying a floating-point variable through many successive cycles and then using it to perform an IF test. This process is common in numerical integration. There are several remedies. You can compute x and s as multiples of an integer index, for example, replacing the statement that increments x with x = n * 0.2 to avoid round-off accumulation. You might test for completion on the integer index, such as IF (n <= 50) GOTO 1, or use a DO loop, such as DO n= 1,51. If you must test on the real variable that is being cycled, use a realistic tolerance, such as IF (x <= 10.001).

Floating-point arithmetic does not always obey the standard rules of algebra exactly. Addition is not precisely associative when round-off errors are considered. You can use parentheses to express the exact evaluation you require to compute a correct, accurate answer. This is recommended when you specify optimization for your generated code, since associativity may otherwise be unpredictable.

The expressions (x + y) + z and x + (y + z) can give unexpected results in some cases, as the Visual Fortran Sample ASSOCN.F90 in the ...\DF98\SAMPLES\TUTORIAL folder shows. This example demonstrates the danger of combining two numbers whose values differ by more than the number of significant digits.

The Sample INTERVAL.F90 in the ...\DF98\SAMPLES\TUTORIAL folder shows how changing the rounding precision and rounding mode in the floating-point control word between calculations affects the calculated result of the following simple expression:

  (q*r + s*t) / (u + v)

The Visual Fortran Sample EPSILON.F90 in the ...\DF98\SAMPLES\TUTORIAL folder illustrates difficulties that rounding errors can cause in expressions like 1.0 + eps, where eps is just significant compared to 1.0.

The compiler uses the default rounding mode (round-to-nearest) during compilation. The compiler performs more compile-time operations that eliminate run-time operations as the optimization level increases. If you set rounding mode to a different setting (other than round-to-nearest), that rounding mode is used only if that computation is performed at run-time. For example, the Sample INTERVAL.F90 is compiled at /optimize:0, which disables certain compile-time optimizations, including constant propagation and inlining.

For more information:

ULPs, Relative Error, and Machine Epsilon