Intel ARCHITECTURE IA-32 User Manual download pdf (Page 327)

100

101

Optimizing Cache Usage 6

6-37

In scenario to the right, in Figure 6-7, keeping the data in one way of the

second-level cache does not improve cache locality. Therefore, use

prefetcht0 to prefetch the data. This amortizes the latency of the

memory references in passes 1 and 2, and keeps a copy of the data in

second-level cache, which reduces memory traffic and latencies for

passes 3 and 4. To further reduce the latency, it might be worth

considering extra

prefetchnta instructions prior to the memory

references in passes 3 and 4.

In Example 6-7, consider the data access patterns of a 3D geometry

engine first without strip-mining and then incorporating strip-mining.

Note that 4-wide SIMD instructions of Pentium III processor can

process 4 vertices per every iteration.

Example 6-7 Data Access of a 3D Geometry Engine without Strip-mining

while (nvtx < MAX_NUM_VTX) {

prefetchnta vertex

data // v =[x,y,z,nx,ny,nz,tu,tv]

prefetchnta vertex

i+1

data

prefetchnta vertex

i+2

data

prefetchnta vertex

i+3

data

TRANSFORMATION code // use only x,y,z,tu,tv of a vertex

nvtx+=4

}

while (nvtx < MAX_NUM_VTX) {

prefetchnta vertex

data // v =[x,y,z,nx,ny,nz,tu,tv]

// x,y,z fetched again

prefetchnta vertex

i+1

data

prefetchnta vertex

i+2

data

prefetchnta vertex

i+3

data

compute the light vectors // use only x,y,z

LOCAL LIGHTING code // use only nx,ny,nz

nvtx+=4

}

1 2 ... 322 323 324 325 326 327 328 329 330 331 332 ... 567 568

Comments to this Manuals

No comments

Intel ARCHITECTURE IA-32 User Manual Page 327

Comments to this Manuals

Related products and manuals for Computer Accessories Intel ARCHITECTURE IA-32