Intel ARCHITECTURE IA-32 User Manual Page 294

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 293
IA-32 Intel® Architecture Optimization
6-4
Optimize software prefetch scheduling distance:
Far ahead enough to allow interim computation to overlap
memory access time.
Near enough that the prefetched data is not replaced from the
data cache.
Use software prefetch concatenation:
Arrange prefetches to avoid unnecessary prefetches at the end
of an inner loop and to prefetch the first few iterations of the
inner loop inside the next outer loop.
Minimize the number of software prefetches:
Prefetch instructions are not completely free in terms of bus
cycles, machine cycles and resources; excessive usage of
prefetches can adversely impact application performance.
Interleave prefetch with computation instructions:
For best performance, software prefetch instructions must be
interspersed with other computational instructions in the
instruction sequence rather than clustered together.
Hardware Prefetching of Data
The Pentium 4, Intel Xeon, Pentium M, Intel Core Solo and Intel Core
Duo processors implement a hardware automatic data prefetcher which
monitors application data access patterns and prefetches data
automatically. This behavior is automatic and does not require
programmers intervention directly.
Characteristics of the hardware data prefetcher for the Pentium 4 and
Intel Xeon processors are:
1. Requires two successive cache misses in the last level cache to
trigger the mechanism and these two cache misses satisfying the
condition that the strides of the cache misses is less than the trigger
distance of the hardware prefetch mechanism (see Table 1-2).
2. Attempts to stay 256 bytes ahead of current data access locations
Page view 293
1 2 ... 289 290 291 292 293 294 295 296 297 298 299 ... 567 568

Comments to this Manuals

No comments