Intel ARCHITECTURE IA-32 User Manual Page 520

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 519
IA-32 Intel® Architecture Optimization
C-6
Latency and Throughput with Register Operands
IA-32 instruction latency and throughput data are presented in
Table C-2 through Table C-8. The tables include the Streaming SIMD
Extension 3, Streaming SIMD Extension 2, Streaming SIMD Extension,
MMX technology and most of commonly used IA-32 instructions.
Instruction latency and throughput of the Pentium 4 processor and of the
Pentium M processor are given in separate columns. Pentium 4
processor instruction timing data is implementation specific, i.e. can
vary between model encoding value = 3 and model < 2. Separate data
sets of instruction latency and throughput are shown in the columns for
CPUID signature 0xF2n and 0xF3n. The notation 0xF2n represents the
hex value of the lower 12 bits of the EAX register reported by CPUID
instruction with input value of EAX = 1; ‘F’ indicates the family
encoding value is 15, ‘2’ indicates the model encoding is 2, ‘n’ indicates
it applies to any value in the stepping encoding. Pentium M processor
instruction timing data is shown in the columns represented by CPUID
signature 0x69n. The instruction timing for Pentium M processor with
CPUID signature 0x6Dn is the same as that of 0x69n.
Table C-1 Streaming SIMD Extension 3 SIMD Floating-point Instructions
Instruction Latency
1
Throughput Execution Unit
CPUID 0F3n 0F3n 0F3n
ADDSUBPD/ADDSUBPS 5 2 FP_ADD
HADDPD/HADDPS 13 4 FP_ADD,FP_MISC
HSUBPD/HSUBPS 13 4 FP_ADD,FP_MISC
MOVDDUP xmm1, xmm2 4 2 FP_MOVE
MOVSHDUP xmm1,
xmm2
62FP_MOVE
MOVSLDUP xmm1,
xmm2
62FP_MOVE
See “Table Footnotes”
Page view 519
1 2 ... 515 516 517 518 519 520 521 522 523 524 525 ... 567 568

Comments to this Manuals

No comments