ParkourPenguin I post too much Reputation: 138
Joined: 06 Jul 2014 Posts: 4275
|
Posted: Sun Jan 29, 2023 3:18 pm Post subject: |
|
|
Something simple: store the xmm register into memory and change each float individually.
Code: | newmem:
sub rsp,10
movups [rsp],xmm0
movss xmm0,[rsp+4]
subss xmm0,[val_to_sub]
movss [rsp+4],xmm0
movups xmm0,[rsp]
add rsp,10
jmp return |
More complicated: write code using vector intrinsics and let a compiler tell you an "optimal" answer. e.g. C code:
Code: | typedef float v4sf __attribute__ ((vector_size (16)));
v4sf foo(v4sf vec) {
vec[1] -= 2.5f;
return vec;
} |
gcc 12.2:
Code: | foo:
movaps xmm1, xmm0
shufps xmm1, xmm0, 85
movaps xmm2, xmm1
subss xmm2, DWORD PTR .LC0[rip]
movaps xmm1, xmm0
unpcklps xmm1, xmm0
movss xmm1, xmm2
shufps xmm1, xmm0, 225
movaps xmm0, xmm1
ret
.LC0:
.long 1075838976 # float 2.5 |
clang 15.0:
Code: | .LCPI0_0:
.long 0xc0200000 # float -2.5
foo: # @foo
movaps xmm1, xmm0
shufps xmm1, xmm0, 85 # xmm1 = xmm1[1,1],xmm0[1,1]
addss xmm1, dword ptr [rip + .LCPI0_0]
movlhps xmm1, xmm0 # xmm1 = xmm1[0],xmm0[0]
shufps xmm1, xmm0, 226 # xmm1 = xmm1[2,0],xmm0[2,3]
movaps xmm0, xmm1
ret |
You might need to look up documentation for these instructions to better incorporate them into an AA script.
_________________
I don't know where I'm going, but I'll figure it out when I get there. |
|