Cheat Engine Forum Index Cheat Engine
The Official Site of Cheat Engine
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


The hell of aligned SIMD instructions

 
Post new topic   Reply to topic    Cheat Engine Forum Index -> Cheat Engine
View previous topic :: View next topic  
Author Message
Arondai
Newbie cheater
Reputation: 0

Joined: 20 Feb 2016
Posts: 12

PostPosted: Tue Apr 05, 2016 2:11 pm    Post subject: The hell of aligned SIMD instructions Reply with quote

While debugging a modification I wrote for a specific piece of software, I was puzzled by the fact that the software crashed when applying my modification,even though I thought to be absolutely sure I did not do anything wrong. After alot of debugging I found the problem to lie in ntdll code that assumed some of the data on the stack to be aligned on 16-byte boundaries.
My solution was pretty simple actually and something I wanted to share with you guys.

So, the problematic instruction in this case was the movaps instruction, which assumes that a source or destination memory operand starts on a 16-byte boundary. This basically means that the lowest nibble of your address (the rightmost 4 bits of the address) needs to be 0. In that case, the address is aligned correctly for movaps to function. The alignment is necessary,because such a SIMD instruction can retrieve multiple data elements (four floats for instance) at once, instead of using two or more instructions. This only works in case the data is in aligned blocks and/or is written to aligned blocks. In the case of the movaps instruction, a single register can be loaded with four floats with a single instruction call for example. Or the 128-bits of data can be written to memory in one instruction.

In case the data is not aligned on a 16-byte address you will get a general protection fault, pretty nasty.

Now, there is also a version of the movaps instruction that does the same, but does not need the 16-byte boundary enforcement. Thats the movups, where you might guess that the u stands for unaligned. The drawback however is manifold:

1. You need to change that instruction and what if you don't want
to modify code that resides in kernel modules (a wise choice btw, not to change kernel modules unless you have a VERY good reason!);
2. What if you are unaware of the amount of these aligned instructions deep down in a routine you are trying to execute from your modification?;
3. No parallellism possible and as such we arrive at reason 4:
4. Slightly less optimized piece of code

Point 4 is often not a big issue, unless alot of these instructions are used heavily.

Now there are more of these aligned instructions (google is your friend) and compile/link time it can be enforced that your data is on aligned boundaries but runtime you are on your own, the trees are planted in the forest and you can't simply reposition them and assume that it will work.

Now I thought of a solution that is quite elegant if I may say so and in my case it was a simple solution as the data relied on the stack pointer. I simply need to make sure that the stackpointer was pointing to 16-byte aligned addresses. Now how do you that? Simple: just truncate the rightmost nibble to 0. Is it that simple? Almost, by truncating, you are ruining the stackpointer of course. Popping of values from the stack after truncating has undesirable effects (you do the math). So instead, store your stackpointer in a temporary variable, subtract 16 from the stackpointer so you can safely truncate without pointing halfway to data that was already pushed onto the stack, like this for instance:

Code:

mov [old_esp],esp
sub esp,10 (hexadecimal!)
and sp,fff0
.....call code that assumes 16-byte aligned stack-pointer ...
mov esp,[old_esp]


EDIT: Small elegancy addition: you do not need to subtract 16 from the stackpointer, as by truncating you are already making room for more data on the stack as the truncate always decrements the stackpointer (the result after truncate is always smaller than the original value). So, simply remove the sub esp,10 instruction.

Now the beautiful thing about understanding reverse engineering principles is that you only need to worry about complying to the constraints the code to be called (a routine) is asking as the code within this routine already complies to that same principle (assuming you did not alter that too), as it would not work at all in the non-modified variant. Think about that for a while and it will make sense at some point, hopefully. If not: what happens if the code is called in a non-modified scenario? To which constraints does the stack-pointer comply? Correct!

The irritating thing about these problems is that the crashes may appear at random, while you are modifying your modification. Sometimes the stackpointer is aligned and most of the time it isn't

Hopefully this sheds some light on a topic that's tricky.


Last edited by Arondai on Tue Apr 05, 2016 2:37 pm; edited 1 time in total
Back to top
View user's profile Send private message
mgr.inz.Player
I post too much
Reputation: 222

Joined: 07 Nov 2008
Posts: 4438
Location: W kraju nad Wisla. UTC+01:00

PostPosted: Tue Apr 05, 2016 2:33 pm    Post subject: Reply with quote

We usually just use movups instead of movaps.


and if you need addps, we are doing something like this:


Code:
addps xmm1,[myvalue]

...
...

newmem+700:
myvalue:
dd (float)3
dd (float)4
dd (float)5
dd (float)6







stack aligning


32bit
Code:

  push ebp
  mov ebp,esp
  sub esp,20 // 32 bytes on stack


  and esp,-10 //  aligning the stack with the next lowest 16-byte boundary
  ...
  ...

  mov esp,ebp
  pop ebp


64bit
Code:

  push rbp
  mov rbp,rsp
  sub rsp,20 // 32 bytes on stack


  and rsp,-10 //  aligning the stack with the next lowest 16-byte boundary
  ...
  ...

  mov rsp,rbp
  pop rbp

_________________
Back to top
View user's profile Send private message MSN Messenger
Arondai
Newbie cheater
Reputation: 0

Joined: 20 Feb 2016
Posts: 12

PostPosted: Tue Apr 05, 2016 2:40 pm    Post subject: Reply with quote

Ah yes, so me adding explanation to this does make sense I guess and therefore this post is still useful I guess.
It is not my goal to try and show what I thought of, as I am fully aware that nobody really invents new things. People learn if they are given explanations.

Forgive me for asking, but who is we?

And it's better to adhere to given constraints, as this results in less buggy code. Knowing why you do something, which options you have, will let you make decisions that make sense.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Cheat Engine Forum Index -> Cheat Engine All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group

CE Wiki   IRC (#CEF)   Twitter
Third party websites