Intel ARCHITECTURE IA-32 User Manual Page 52

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 51
IA-32 Intel® Architecture Optimization
1-24
Thus, software optimization of a data access pattern should emphasize
tuning for hardware prefetch first to favor greater proportions of
smaller-stride data accesses in the workload; before attempting to
provide hints to the processor by employing software prefetch
instructions.
Loads and Stores
The Pentium 4 processor employs the following techniques to speed up
the execution of memory operations:
speculative execution of loads
reordering of loads with respect to loads and stores
multiple outstanding misses
buffering of writes
forwarding of data from stores to dependent loads
Performance may be enhanced by not exceeding the memory issue
bandwidth and buffer resources provided by the processor. Up to one
load and one store may be issued for each cycle from a memory port
reservation station. In order to be dispatched to a reservation station,
there must be a buffer entry available for each memory operation. There
are 48 load buffers and 24 store buffers
3
. These buffers hold the µop and
address information until the operation is completed, retired, and
deallocated.
The Pentium 4 processor is designed to enable the execution of memory
operations out of order with respect to other instructions and with
respect to each other. Loads can be carried out speculatively, that is,
before all preceding branches are resolved. However, speculative loads
cannot cause page faults.
3. Pentium 4 processors with CPUID model encoding equal to 3 have more than 24 store
buffers.
Page view 51
1 2 ... 47 48 49 50 51 52 53 54 55 56 57 ... 567 568

Comments to this Manuals

No comments