Intel ARCHITECTURE IA-32 User Manual download pdf (Page 446)

100

101

IA-32 Intel® Architecture Optimization

A-12

duration of read traffic compared to the duration of the workload is

significantly less than unity, it indicates the dominant data locality of the

workload is cache access traffic.

Average Bus Queue Depth: Using the default configuration of the

processor event “Bus Reads Underway from the Processor”

, one can

measure the weighted cycles of bus read traffic, where it is weighted by

the depth of queue of bus reads. Thus, one can derive average queue

depth from the ratio of weighted cycles over the effective duration of

bus read traffic. Similarly, one can use other bus events to measure the

average bus queue depth for other type of bus transactions.

Using the average queue depth of read traffic, one can characterize the

degree of bus read traffic that originates from the cache miss pattern of

the workload and are sensitive memory latency. This can be done by

comparing whether the average bus queue depth of read traffic, under

the condition of disabling hardware prefetch

while measuring this bus

event, is close to unity. When this ratio is very close to unity, it implies

the workload has a data access pattern with very poor data parallelism

and will fully exposes memory latency whenever cache misses occur.

Large Stride Inefficiency: Large-stride data accesses are much less

efficient than smaller stride data accesses, because large stride accesses

will incur more frequent DTLB misses during address translation. The

penalty of large stride accesses apply to cache traffic as well as memory

traffic. In terms of the quantitative impact on data access latency, large

2. Note that by default Pentium 4 processor events dealing with bus traffic, such as Bus Reads

Underway from the Processor, will implicitly combine the interaction of two aspects: (a) a

cache miss pattern of the last level cache as a result of the data reference pattern of the

workload, each cache read miss is expected to require a bus read request to fetch data (a

cache line) from the memory sub-system; (b) in the presence of hardware prefetch being

enabled, a cache read miss may trigger the hardware prefetch to queue up additional bus

read requests to fetch additional cache lines from the memory sub-system.

3. Hardware prefetch mechanisms can be controlled on demand using the model-specific

Software Developer’s Manual, Volume 3B describes the specific bit locations of the

IA32_MISC_ENABLES MSR.

1 2 ... 441 442 443 444 445 446 447 448 449 450 451 ... 567 568

Comments to this Manuals

No comments

Intel ARCHITECTURE IA-32 User Manual Page 446

Comments to this Manuals

Related products and manuals for Computer Accessories Intel ARCHITECTURE IA-32