Intel ARCHITECTURE IA-32 User Manual Page 270

  • Download
  • Add to my manuals
  • Print
  • Page
    / 568
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 269
IA-32 Intel® Architecture Optimization
5-8
Figure 5-2 shows how 1 result would be computed for 7 instructions if
the data were organized as AoS and using SSE alone: 4 results would
require 28 instructions.
Figure 5-2 Dot Product Operation
Example 5-1 Pseudocode for Horizontal (xyz, AoS) Computation
mulps ; x*x', y*y', z*z'
movaps ; reg->reg move, since next steps overwrite
shufps ; get b,a,d,c from a,b,c,d
addps ; get a+b,a+b,c+d,c+d
movaps ; reg->reg move
shufps ; get c+d,c+d,a+b,a+b from prior addps
addps ; get a+b+c+d,a+b+c+d,a+b+c+d,a+b+c+d
OM15168
X
+
X
+
X
+
X
=
X1 X2 X3 X4
Fx Fx Fx Fx
Y1 Y2 Y3 Y4
Fy Fy Fy Fy
Z1 Z2 Z3 Z4
Fz Fz Fz Fz
W1 W2 W3 W4
Fw Fw Fw Fw
R1 R2 R3 R4
Page view 269
1 2 ... 265 266 267 268 269 270 271 272 273 274 275 ... 567 568

Comments to this Manuals

No comments