Intel ARCHITECTURE IA-32 User Manual download pdf (Page 215)

100

101

Coding for SIMD Architectures 3

3-35

For the first iteration of the inner loop, each access to array B will generate a

cache miss.

If the size of one row of array A, that is, A[2, 0:MAX-1], is

large enough, by the time the second iteration starts, each access to array

B will always generate a cache miss. For instance, on the first iteration,

the cache line containing

B[0, 0:7] will be brought in when B[0,0] is

referenced because the

float type variable is four bytes and each cache

line is 32 bytes. Due to the limitation of cache capacity, this line will be

evicted due to conflict misses before the inner loop reaches the end. For

the next iteration of the outer loop, another cache miss will be generated

while referencing

B[0,1]. In this manner, a cache miss occurs when

each element of array

B is referenced, that is, there is no data reuse in the

cache at all for array

Example 3-20 Loop Blocking

A. Original Loop

float A[MAX, MAX], B[MAX, MAX]

for (i=0; i< MAX; i++) {

for (j=0; j< MAX; j++) {

A[i,j] = A[i,j] + B[j, i];

}

B. Transformed Loop after Blocking

float A[MAX, MAX], B[MAX, MAX];

for (i=0; i< MAX; i+=block_size) {

for (j=0; j< MAX; j+=block_size) {

for (ii=i; ii<i+block_size; ii++) {

for (jj=j; jj<j+block_size; jj++) {

A[ii,jj] = A[ii,jj] + B[jj, ii];

}

1 2 ... 210 211 212 213 214 215 216 217 218 219 220 ... 567 568

Comments to this Manuals

No comments

Intel ARCHITECTURE IA-32 User Manual Page 215

Comments to this Manuals

Related products and manuals for Computer Accessories Intel ARCHITECTURE IA-32