Intel ARCHITECTURE IA-32 User Manual Page 156

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 155
IA-32 Intel® Architecture Optimization
2-84
improve address alignment, a small piece of prolog code using
movsb/stosb with count less than 4 can be used to peel off the
non-aligned data moves before starting to use movsd/stosd.
For cases where N is less than half the size of last level cache,
throughput consideration may favor either: (a) an approach using
REP string with the largest data granularity because REP string has
little overhead for loop iteration, and the branch misprediction
overhead in the prolog/epilogue code to handle address alignment is
amortized over many iterations (b) an iterative approach using the
instruction with largest data granularity; where the overhead for
SIMD feature detection, iteration overhead, prolog/epilogue for
alignment control can be minimized. The trade-off between these
approaches may depend on the microarchitecture.
An example of memset() implemented using stosd for arbitrary
counter value with the destination address aligned to doubleword
boundary in 32-bit mode is shown in Table 2-5.
For cases N > half the size of last level cache, using 16-byte
granularity streaming stores with prolog/epilog for address
alignment will likely be more efficient, if the destination addresses
will not be referenced immediately afterwards.
Page view 155
1 2 ... 151 152 153 154 155 156 157 158 159 160 161 ... 567 568

Comments to this Manuals

No comments