Loss of Precision Errors: Rounding, Special Values, Underflow, and Overflow

If a real number is not exactly one of the representable floating-point numbers, then the nearest floating-point number must represent it. The rounding error is the difference between the exact real number and its nearest floating-point representation. The floating-point number representing a rounded real number is called inexact.

Normally, calculations proceed when an inexact value results. Almost any floating-point operation can produce an inexact result. The rounding mode (round up, round down, round nearest, truncate) is determined by the floating-point control word.

If an arithmetic operation does not result in an exact, valid floating-point number, which includes numbers that have been rounded to an exactly representable floating-point number, it results in a special value: signed zero, signed infinity, NaN, or a denormal. Special-value results are a limiting case of the arithmetic operation involved. Special values can propagate through your arithmetic operations without causing your program to fail, and often providing usable results.

If an arithmetic operation results in an exact value, but the value is invalid, the operation causes underflow or overflow:

Inexact numbers, special values, underflows, and overflows are floating-point exceptions. You can select how rounding is done and how exceptions are handled by setting the floating-point control word. Setting the control word is described in Setting and Retrieving Floating-Point Status and Control Words (ia32 only) and exception handling in Handling Floating-Point Exceptions (ia32 only).

For a further discussion of rounding errors see: