Intel ARCHITECTURE IA-32 User Manual Page 336

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 335
IA-32 Intel® Architecture Optimization
6-46
Later, the processor re-reads the data using prefetchnta, which ensures
maximum bandwidth, yet minimizes disturbance of other cached
temporal data by using the non-temporal (NTA) version of prefetch.
Conclusions from Video Encoder and Decoder
Implementation
These two examples indicate that by using an appropriate combination
of non-temporal prefetches and non-temporal stores, an application can
be designed to lessen the overhead of memory transactions by
preventing second-level cache pollution, keeping useful data in the
second-level cache and reducing costly write-back transactions. Even if
an application does not gain performance significantly from having data
ready from prefetches, it can improve from more efficient use of the
second-level cache and memory. Such design reduces the encoders
demand for such critical resource as the memory bus. This makes the
system more balanced, resulting in higher performance.
Optimizing Memory Copy Routines
Creating memory copy routines for large amounts of data is a common
task in software optimization.
Example 6-10 presents a basic algorithm for a the simple memory copy.
This task can be optimized using various coding techniques. One
technique uses software prefetch and streaming store instructions. It is
discussed in the following paragraph and a code example is shown in
Example 6-11.
Example 6-10 Basic Algorithm of a Simple Memory Copy
#define N 512000
double a[N], b[N];
for (i = 0; i < N; i++) {
b[i] = a[i];
}
Page view 335
1 2 ... 331 332 333 334 335 336 337 338 339 340 341 ... 567 568

Comments to this Manuals

No comments