Intel ARCHITECTURE IA-32 User Manual Page 217

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 216
Coding for SIMD Architectures 3
3-37
As one can see, all the redundant cache misses can be eliminated by
applying this loop blocking technique. If
MAX is huge, loop blocking can
also help reduce the penalty from DTLB (data translation look-aside
buffer) misses. In addition to improving the cache/memory
performance, this optimization technique also saves external bus
bandwidth.
Instruction Selection
The following section gives some guidelines for choosing instructions
to complete a task.
One barrier to SIMD computation can be the existence of
data-dependent branches. Conditional moves can be used to eliminate
data-dependent branches. Conditional moves can be emulated in SIMD
computation by using masked compares and logicals, as shown in
Example 3-21.
Example 3-21 Emulation of Conditional Moves
High-level code:
short A[MAX_ELEMENT], B[MAX_ELEMENT], C[MAX_ELEMENT],
D[MAX_ELEMENT], E[MAX_ELEMENT];
for (i=0; i<MAX_ELEMENT; i++) {
if (A[i] > B[i]) {
C[i] = D[i];
} else {
C[i] = E[i];
}
}
Assembly code:
xor eax, eax
continued
Page view 216
1 2 ... 212 213 214 215 216 217 218 219 220 221 222 ... 567 568

Comments to this Manuals

No comments