Bruno Stuani's contribution to the world: Writing a litle code length diasm

If you are interested in a light-weight, fast and accurate code length disassembler for Intel 32bits architecture then you are in the right place at the right time. If you are not interested maybe you can read this post and start to get involved on this low-level stuff. But wait... my guess is that you won't use this for absolutely nothing right? If you know where you can use this, good very good then you can stop reading this and go directly to the code above. I'm not telling you that this is a must know stuff, actually you don't need to know how stuff works internally if they provide interfaces that you can use in order to save you knowledge, time and work.

Let's take as an example MadCodeHook - an API hooking library - that uses an internal code length disassembler in order to work properly. If you use his library you don't need to care about all this low-level stuff, you don't need to know what it does internally. But this is what make that guy smarter and better then you - He knows, he has knowledge. What about you? You know that calling a function named "HookCode()" does all the job. Is that enough to know? I don't think so.

I promess you that on the next posts you will see why this little code length disassembler is very important. For now I can only give you a brief idea of where you are going to use this banch of codes. Imagine that you will modify a program flow on-the-fly. This mean that you will, for example, add some codes into a specific offset. Ok, good, but for letting you know you cannot just "insert" code on the fly. Look on the following screenshot of a program flow:

You cannot just insert some code after the blue line and keep the original code executing after your code. You need to do some reallocations on the memory, you need to copy the instructions you are overriding to somewhere else. But pay attention at the image: There are instructions that take one byte, there are instructions that take 5 bytes, 2 bytes, etc. For copying the instructions you have just overridden to somewhere else requires an algorithm to find out the size of the full opcode. This way you make sure you are copying the full instruction instead of a fixed size. Resuming: The program memory is not just like a notepad where you can insert your text into any position and keep reading the text.

Pre requisites

Interest and time;
C or C++ skills;
Know about x86 instruction set;
Download the Intel(R) x86 instruction set manual;

This is not a tutorial

Yup, not a tutorial. I'm here only to help you to envolve your self on this subject. You are the one who will lead your brain to this task. At the end of this post you can download the code that does all the job but keep in mind that this is what differs me from you: I know what I'm doing, I know exactly what's going on behind the scenes here. I want you to try learning, and this post should be your guide.

Anyway, looks like you have everything setted up. You know C or C++, you are really interested, you understand what is a x86 instruction set, you know how the processor does it job and you have the Intel instruction set manual. Ah, at this point you might be really scarred too, if you have openned the PDF I told you to download you might know what I'm talking about: 854 pages! 854 pages that we will pass trough in order to complete this task! Seriously, not joking!

Let's first understand what we need from this manual:

Do you see the little black bordered box? The manual has thousands of them. They are the importand information that tell us a valid sequence of bytes in order to complete the full opcode. For this example we have 0F 08 this means that if the instruction starts with 0x0F and has a 0x08 as a sequence we just found a full instruction and it's length is 2 bytes. (2! this is what a code-length-disassembler should return on an opcode that has this sequence "0x0F 0x08").

Got the idea of what we need to do here? Hope so! Unfortunatelly you will find something more then just bytes on the first column during the manual reading. You will find things like /r, /7, etc. and I will let you read the documentation and find out what they mean ;-)

You will find out something around 850 sequences of bytes that tells you the length of opcodes. What we wanna do next is to collect all these informations and organize in a way that we can compute the length of the opcode in a given offset on the fly, instantly, fast. There are several ways for organizing these informations and computing them based on binary trees, you can even use compiler techniques such as using grammar recognization tools, and seriously, this is up to you. I'm showing here on my blog the way I did in like 3 years ago. I just don't wanna go back and rewritte something that I know is working and that I did, so fell free to modify or to do your own engine.

I divided this task into three steps:

Manually generate a string table of content;
Compile this table into something smaller and more algorithm accessible;
Give the compiled table to the function that determins the opcode length.

As you see the first and second step will be done once. Once we have the compiled table we can use it's information to find the code length. This will probably be the must do steps for most code-length-finding-algorithms. Again, if you have a better idea of doing this, good luck! I'm totally aware that the nexts steps are not the most efficient way of doing this. At the third step this became a data structure problem. I'm not here to tell you the better data structure algorithm for this, but I'm here for telling you how I did my code length disassembler. Ok ok I'll stop saying that this is my personal vision of the problem. From now on this will be implicit.

The first step

Digg digg digg.. this is what you should do on the first step. Open the PDF and start copying and pasting into an array of strings. Or you can copy everything and write a program that reads the box content and put them into this table of contents. My one is looking like this:

.....

"0F F3 /r "

"0F F4 /r "

"0F F5 /r "

"0F F6 /r "

"0F F7 /r "

"0F F8 /r "

"0F F9 /r "

"0F FA /r "

"0F FB /r "

"0F FC /r "

"0F FD /r "

"0F FE /r "

"10 /r "

"11 /r "

"12 /r "

"13 /r "

"14 ib "

"15 id "

"15 iw "

"16 "

"17 "

"18 /r "

"19 /r "

"1A /r "

"1B /r "

"1C ib "

"1D id "

"1D iw "

"1E "

.....

This can take a long time. Looong time, I don't know why but I'm sure you will copy the one I did - and the reason is pretty obvious anh? -

The second step

After you find out what those ib, id, iw, /r, etc means you should parse this array into a lower level table of structures that is understandable by the length-disassembler algorithm. I made my algorithm to translate this array of string into an array of the following structure:

// Immediate size (byte, word or double word)

enum ImmediateSize { iNone, ib, iw, id };

// Opcode structure
struct OPCODEREC{
  // Total opcodes
  unsigned char opcodes;
  // Uses modRM?
  bool uses_modrm;
  // Uses register code?
  bool uses_regcod;
  // Uses digit?
  bool uses_digit;
  // If so, which digit?
  unsigned char digit;
  // At most 3 bytes to form one opcode
  unsigned char opcode[3];
  // Iterator size (ib, id, iw)
  ImmediateSize iterator;
  // Code offset (cb, cd, cw)
  ImmediateSize codeOffset;
};

Those are all the informations we need for each opcode to calculate the opcode length. After completing this step - I mean... converting the string hand-wrote table into this array of OPCODEREC structure - comes the serialization step. We need to write this into an array of chars that will be readable by our algorithm upon initialization. I decided to give this table to the algorithm in a way that we can use a hashtable lookup to locate the opcode so we always have this done in an order of 1.

Well, we are almost done with this second step! All you need to do now is to serialize this table in a way that can be loaded by your code-length-disassembler algorithm. I decided to take the result array and build a .h file containing the code to load the table in memory all automatic, so we can make the call to my program in the build process, before starting reading the main .cpp file.

This is how a single record of the table looks like:

opcodes: 2

uses_modrm: false

uses_regcode: true

uses_digit: false;

digit: 0x00

iterator: ib

codeOffset: ib

opcode: [0xFF, 0xFF, 0x00]

This structure in memory will look like this:

02 00 01 00 00 01 01 0F 0F 00

And then converting to an string of array of char we will have this:

string: "char ContentTable[] = {0x02, 0x00, 0x01, 0x00, 0x00, 0x01, 0x01, 0x0F, 0x0F, 0x00}"

And this will be saved in a header file that will be used by our disassembler source code in compile time. Got the idea? I made a little EXE that takes two arguments, the input and output file. The input file has all the opcode entries like I described up there. The outfile file for my program looks like this:

#IFNDEF TableContent_H

#DEFINE TableContent_H

char TableContent[] = {

0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00,

0x01, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x03, 0x00,

0x01, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x03, 0x00,

0x01, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00,

0x01, 0x00, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x00,

0x01, 0x01, 0x00, 0x00, 0x00, 0x0B, 0x00, 0x00, 0x52, 0x00,

...

0x01, 0x01, 0x00, 0x01, 0x04, 0xFF, 0x00, 0x00, 0x90, 0xF9,

0x01, 0x01, 0x00, 0x01, 0x05, 0xFF, 0x00, 0x00, 0x20, 0xFA,

};

#ENDIF

Third step

Woff.. after a long trip of writing the table compiler (btw, what's woff? o.O) it's time to take this table and use to find a given code-length.

(please be patient, I'm still writing this as you are reading)

Guys, do you need me to finish this article? What do you think? I kinda got no patience to finish since I see no interest of the public. Please leave a comment!

Bruno Stuani's contribution to the world

Tuesday, September 16, 2008

Writing a litle code length diasm

No comments:

Blog Archive

About Me

Bruno Martins Stuani