Intel ARCHITECTURE IA-32 User Manual Page 214

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 213
IA-32 Intel® Architecture Optimization
3-34
In Example 3-19, the computation has been strip-mined to a size
strip_size. The value strip_size is chosen such that strip_size
elements of array
v[Num] fit into the cache hierarchy. By doing this, a
given element
v[i] brought into the cache by Transform(v[i]) will
still be in the cache when we perform
Lighting(v[i]), and thus
improve performance over the non-strip-mined code.
Loop Blocking
Loop blocking is another useful technique for memory performance
optimization. The main purpose of loop blocking is also to eliminate as
many cache misses as possible. This technique transforms the memory
domain of a given problem into smaller chunks rather than sequentially
traversing through the entire memory domain. Each chunk should be
small enough to fit all the data for a given computation into the cache,
thereby maximizing data reuse. In fact, one can treat loop blocking as
strip mining in two or more dimensions. Consider the code in
Example 3-18 and access pattern in Figure 3-3. The two-dimensional
array
A is referenced in the j (column) direction and then referenced in
the
i (row) direction (column-major order); whereas array B is
referenced in the opposite manner (row-major order). Assume the
memory layout is in column-major order; therefore, the access strides of
array
A and B for the code in Example 3-20 would be 1 and MAX,
respectively.
Page view 213
1 2 ... 209 210 211 212 213 214 215 216 217 218 219 ... 567 568

Comments to this Manuals

No comments