Wednesday, January 24, 2024

Reversing Pascal String Object

There are many goodware and malware developed in pascal, and we will see that the binary generated by the pascal compilers is fascinating, not only because the small and clean generated binaries, or the  clarity of the pascal code, but also the good performance. In Linux we have Lazarus which is a good free IDE like Delphi and Kylix the free pascal IDE for windows.

The program:

program strtest;

var
  cstr:  array[0..10] of char;
  s, s2:  ShortString;

begin
  cstr := 'hello world';
  s  := cstr;
  s2 := 'test';
  
  WriteLn(cstr + ' ' + s + ' ' + s2);
end.


We are going to compile it with freepascal and lazarus, and just the binary size differs a lot:

lazarus          242,176 btytes  845 functions
freepascal       32,256 bytes   233 functions
turbopascal      2,928 bytes     80 functions  (wow)

And surprisingly turbopascal binaries are extremely light.
Lets start with lazarus:




Logically it imports from user32.dll some display functions, it also import the kernel32.dll functions and suspiciously the string operations of oleaut32.dll 


And our starting point is a function called entry that calls the console initialization and retrieve some console configurations, and then start a labyrinth of function calls.



On functions 10000e8e0 there is the function that calls the main function.

I named execute_param2 because the second param is a function pointer that is gonna be executed without parameters, it sounds like main calling typical strategy.
And here we are, it's clearly the user code pascal main function.


What it seems is that function 100001800 returns an string object, then is called its constructor to initialize the string, then the string is passed to other functions that prints it to the screen.

This function executes the method 0x1c0 of the object until the byte 0x89 is a null byte.
What the hell is doing here?
First of all let's create the function main:


Simply right button create function:

After a bit of work on Ghidra here we have the main:


Note that the struct member so high like 0x1b0 are not created by default, we should import a .h file with an struct or class definition, and locate the constructor just on that position.

The mysterious function was printing byte a byte until null byte, the algorithm the compiler implemented in asm is not as optimized as turbopascal's.

In Windbg we can see the string object in eax after being created but before being initialized:












Just before executing the print function, the RCX parameter is the string object and it still identical:


Let's see the constructor code.
The constructor address can be guessed on static walking the reverse-cross-references to main, but I located it in debugging it in dynamic analysis.


The constructor reads only a pointer stored on the string object on the position 0x98.

And we have that the pointer at 0x98 is compared with the address of the literal, so now we know that this pointer points to the string.
The sentence *string_x98 = literal confirms it, and there is not memory copy, it only points reusing the literal.



Freepascal

The starting labyrinth is bigger than Lazarus so I had to begin the maze from the end, searching the string "hello world" and then finding the string references:


There are two ways to follow the references in Ghidra, one is [ctrl] + [shift] + F  but there is other trick which is simply clicking the green references texts on the disassembly.

At the beginning I doubted and put the name possible_main, but it's clearly the pascal user code main function.




The char array initialization Is converted by freepascal compiler to an runtime initialization using mov instructions.

Reducing the coverage on dynamic we arrive to the writeln function:


EAX helds  a pointer to a struct, and the member 0x24 performs the printing. In this cases the function can be tracked easily in dynamic executing the sample.

And lands at 0x004059b0 where we see the WriteFile, the stdout descriptor, the text and the size supplied by parameter.


there is an interesting logic of what happens if WriteFile() couldn't write all the bytes, but this is other scope.
Lets see how this functions is called  and how text and size are supplied to figure out the string object.



EBX helds the string object and there are two pointers, a pointer to the string on 0x18 and the length in 0x18, lets verify it on windbg.


And here we have the string object, 0x0000001e is the length, and 0x001de8a68 is the pointer.


Thanks @capi_x for the pascal samples.

Read more


No comments: