Intel ARCHITECTURE IA-32 User Manual download pdf (Page 336)

100

101

IA-32 Intel® Architecture Optimization

6-46

Later, the processor re-reads the data using prefetchnta, which ensures

maximum bandwidth, yet minimizes disturbance of other cached

temporal data by using the non-temporal (NTA) version of prefetch.

Conclusions from Video Encoder and Decoder

Implementation

These two examples indicate that by using an appropriate combination

of non-temporal prefetches and non-temporal stores, an application can

be designed to lessen the overhead of memory transactions by

preventing second-level cache pollution, keeping useful data in the

second-level cache and reducing costly write-back transactions. Even if

an application does not gain performance significantly from having data

ready from prefetches, it can improve from more efficient use of the

second-level cache and memory. Such design reduces the encoder’s

demand for such critical resource as the memory bus. This makes the

system more balanced, resulting in higher performance.

Optimizing Memory Copy Routines

Creating memory copy routines for large amounts of data is a common

task in software optimization.

Example 6-10 presents a basic algorithm for a the simple memory copy.

This task can be optimized using various coding techniques. One

technique uses software prefetch and streaming store instructions. It is

discussed in the following paragraph and a code example is shown in

Example 6-11.

Example 6-10 Basic Algorithm of a Simple Memory Copy

#define N 512000

double a[N], b[N];

for (i = 0; i < N; i++) {

b[i] = a[i];

}

1 2 ... 331 332 333 334 335 336 337 338 339 340 341 ... 567 568

Comments to this Manuals

No comments

Intel ARCHITECTURE IA-32 User Manual Page 336

Comments to this Manuals

Related products and manuals for Computer Accessories Intel ARCHITECTURE IA-32