Software Pipelining (ia64 only)

Software pipelining and additional software dependence analysis are enabled by using the /pipeline option or by the /optimize:5 option. Software pipelining in certain cases improves run-time performance.

The software pipelining optimization applies instruction scheduling to certain innermost loops, allowing instructions within a loop to "wrap around" and execute in a different iteration of the loop. This can reduce the impact of long-latency operations, resulting in faster loop execution.

Loop unrolling (enabled at /optimize:3 or above) cannot schedule across iterations of a loop. Because software pipelining can schedule across loop iterations, it can perform more efficient scheduling to eliminate instruction stalls within loops.

For instance, if software dependence analysis of data flow reveals that certain calculations can be done before or after that iteration of the loop, software pipelining reschedules those instructions ahead of or behind that loop iteration, at places where their execution can prevent instruction stalls or otherwise improve performance.

Software pipelining also enables the prefetching of data to reduce the impact of cache misses.

Software pipelining can be more effective when you combine /pipeline (or /optimize:5) with the appropriate /tune keyword for the target processor generation (see Requesting Optimized Code for a Specific Processor Generation).

To specify software pipelining without loop transformation optimizations, do one of the following:

For this version of Visual Fortran, loops chosen for software pipelining:

By modifying the unrolled loop and inserting instructions as needed before and/or after the unrolled loop, software pipelining generally improves run-time performance, except where the loops contain a large number of instructions with many existing overlapped operations. In this case, software pipelining may not have enough registers available to effectively improve execution performance. Run-time performance using /optimize:5 (or /pipeline) may not improve performance, as compared to using /optimize:4.

For programs that contain loops that exhaust available registers, longer execution times may result with /optimize:5 or /pipeline. In cases where performance does not improve, consider compiling with the /unroll:1 option along with /optimize:5 or /pipeline, to possibly improve the effects of software pipelining.

For more information: