Intel ARCHITECTURE IA-32 User Manual Page 406

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 405
IA-32 Intel® Architecture Optimization
7-60
throughput of a physical processor package. The non-halted CPI metric
can be interpreted as the inverse of the throughput of a logical
processor
9
.
When a single thread is executing and all on-chip execution resources
are available to it, non-halted CPI can indicate the unused execution
bandwidth available in the physical processor package. If the value of a
non-halted CPI is significantly higher than unity and overall on-chip
execution resource utilization is low, a multithreaded application can
direct tuning efforts to encompass the factors discussed earlier.
An optimized single thread with exclusive use of on-chip execution
resources may exhibit a non-halted CPI in the neighborhood of unity
10
.
Because most frequently used instructions typically decode into a single
micro-op and have throughput of no more than two cycles, an optimized
thread that retires one micro-op per cycle is only consuming about one
third of peak retirement bandwidth. Significant portions of the issue port
bandwidth are left unused. Thus, optimizing single-thread performance
usually can be complementary with optimizing a multithreaded
application to take advantage of the benefits of Hyper-Threading
Technology.
On a processor supporting Hyper-Threading Technology, it is possible
that an execution unit with lower throughput than one issue every two
cycles may find itself in contention from two threads implemented using
a data decomposition threading model. In one scenario, this can happen
when the inner loop of both threads rely on executing a low-throughput
instruction, such as
fdiv, and the execution time of the inner loop is
bound by the throughput of
fdiv.
9. Non-halted CPI can correlate to the resource utilization of an application thread, if the
application thread is affinitized to a fixed logical processor.
10. In current implementations of processors based on Intel NetBurst microarchitecture, the
theoretical lower bound for either non-halted CPI or non-sleep CPI is 1/3. Practical
applications rarely achieve any value close to the lower bound.
Page view 405
1 2 ... 401 402 403 404 405 406 407 408 409 410 411 ... 567 568

Comments to this Manuals

No comments