Intel ARCHITECTURE IA-32 User Manual Page 8

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 7
viii
Packed Shuffle Word for 64-bit Registers ...................................................................... 4-18
Packed Shuffle Word for 128-bit Registers .................................................................... 4-19
Unpacking/interleaving 64-bit Data in 128-bit Registers................................................. 4-20
Data Movement .............................................................................................................. 4-21
Conversion Instructions.................................................................................................. 4-21
Generating Constants ........................................................................................................... 4-21
Building Blocks...................................................................................................................... 4-23
Absolute Difference of Unsigned Numbers .................................................................... 4-23
Absolute Difference of Signed Numbers ........................................................................ 4-24
Absolute Value................................................................................................................ 4-25
Clipping to an Arbitrary Range [high, low] ...................................................................... 4-26
Highly Efficient Clipping............................................................................................ 4-27
Clipping to an Arbitrary Unsigned Range [high, low] ................................................ 4-28
Packed Max/Min of Signed Word and Unsigned Byte.................................................... 4-29
Signed Word............................................................................................................. 4-29
Unsigned Byte .......................................................................................................... 4-30
Packed Multiply High Unsigned...................................................................................... 4-30
Packed Sum of Absolute Differences............................................................................. 4-30
Packed Average (Byte/Word)......................................................................................... 4-31
Complex Multiply by a Constant..................................................................................... 4-32
Packed 32*32 Multiply.................................................................................................... 4-33
Packed 64-bit Add/Subtract............................................................................................ 4-33
128-bit Shifts................................................................................................................... 4-33
Memory Optimizations .......................................................................................................... 4-34
Partial Memory Accesses............................................................................................... 4-35
Supplemental Techniques for Avoiding Cache Line Splits........................................ 4-37
Increasing Bandwidth of Memory Fills and Video Fills ................................................... 4-39
Increasing Memory Bandwidth Using the MOVDQ Instruction ................................. 4-39
Increasing Memory Bandwidth by Loading and Storing to and from the
Same DRAM Page ................................................................................................ 4-39
Increasing UC and WC Store Bandwidth by Using Aligned Stores........................... 4-40
Converting from 64-bit to 128-bit SIMD Integer .................................................................... 4-40
SIMD Optimizations and Microarchitectures .................................................................. 4-41
Packed SSE2 Integer versus MMX Instructions....................................................... 4-42
Chapter 5 Optimizing for SIMD Floating-point Applications
General Rules for SIMD Floating-point Code.......................................................................... 5-1
Planning Considerations......................................................................................................... 5-2
Using SIMD Floating-point with x87 Floating-point ................................................................. 5-3
Scalar Floating-point Code...................................................................................................... 5-3
Page view 7
1 2 3 4 5 6 7 8 9 10 11 12 13 ... 567 568

Comments to this Manuals

No comments