Intel ARCHITECTURE IA-32 User Manual download pdf (Page 319)

100

101

Optimizing Cache Usage 6

6-29

Minimize Number of Software Prefetches

Prefetch instructions are not completely free in terms of bus cycles,

machine cycles and resources, even though they require minimal clocks

and memory bandwidth.

Excessive prefetching may lead to performance penalties because issue

penalties in the front-end of the machine and/or resource contention in

the memory sub-system. This effect may be severe in cases where the

target loops are small and/or cases where the target loop is issue-bound.

One approach to solve the excessive prefetching issue is to unroll and/or

software-pipeline the loops to reduce the number of prefetches required.

Figure 6-4 presents a code example which implements prefetch and

unrolls the loop to remove the redundant prefetch instructions whose

prefetch addresses hit the previously issued prefetch instructions. In this

particular example, unrolling the original loop once saves six prefetch

instructions and nine instructions for conditional jumps in every other

iteration.

Figure 6-4 Prefetch and Loop Unrolling

OM15172

top_loop:

prefetchnta [edx+esi+32]

prefetchnta [edx*4+esi+32]

. . . . .

movaps xmm1, [edx+esi]

movaps xmm2, [edx*4+esi]

. . . . .

add esi, 16

cmp esi, ecx

jl top_loop

top_loop:

prefetchnta [edx+esi+128]

prefetchnta [edx*4+esi+128]

. . . . .

movaps xmm1, [edx+esi]

movaps xmm2, [edx*4+esi]

. . . . .

movaps xmm1, [edx+esi+16]

movaps xmm2, [edx*4+esi+16]

. . . . .

movaps xmm1, [edx+esi+96]

movaps xmm2, [edx*4+esi+96]

. . . . .

add esi, 128

cmp esi, ecx

jl top_loop

unrolled

iteration

1 2 ... 314 315 316 317 318 319 320 321 322 323 324 ... 567 568

Comments to this Manuals

No comments

Intel ARCHITECTURE IA-32 User Manual Page 319

Comments to this Manuals

Related products and manuals for Computer Accessories Intel ARCHITECTURE IA-32