Intel ARCHITECTURE IA-32 User Manual Page 407

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 406
Multi-Core and Hyper-Threading Technology 7
7-61
Using a function decomposition threading model, a multithreaded
application can pair up a thread with critical dependence on a
low-throughput resource with other threads that do not have the same
dependency.
User/Source Coding Rule 40. (M impact, L generality) If a single thread
consumes half of the peak bandwidth of a specific execution unit (e.g. fdiv),
consider adding a thread that seldom or rarely relies on that execution unit,
when tuning for Hyper-Threading Technology.
To ensure execution resources are shared cooperatively and efficiently
between two logical processors, it is important to reduce stall
conditions, especially those conditions causing the machine to flush its
pipeline.
The primary indicator of a Pentium 4 processor pipeline stall condition
is called Machine Clear. The metric is available from the VTune
Analyzers event sampling capability. When the machine clear
condition occurs, all instructions that are in flight (at various stages of
processing in the pipeline) must be resolved and then they are either
retired or cancelled. While the pipeline is being cleared, no new
instructions can be fed into the pipeline for execution. Before a machine
clear condition is de-asserted, execution resources are idle.
Reducing the machine clear condition benefits single-thread
performance because it increases the frequency scaling of each thread.
The impact is even higher on processors supporting Hyper-Threading
Technology, because a machine clear condition caused by one thread
can impact other threads executing simultaneously.
Several performance metrics can be used to detect situations that may
cause a pipeline to be cleared. The primary metric is the Machine Clear
Count: it indicates the total number of times a machine clear condition is
asserted due to any cause. Possible causes include memory order
violations and self-modifying code. Assists while executing x87 or SSE
instructions have a similar effect on the processors pipeline and should
be reduced to a minimum.
Page view 406
1 2 ... 402 403 404 405 406 407 408 409 410 411 412 ... 567 568

Comments to this Manuals

No comments