PE Malware Static Analysis

PE Malware Static Analysis

Reading Level: Intermediate.

Any Computer Security enthusiasts out there? yes this subject can get extremely complicated, we both agree on that. But I was once told if I can explain something in simple words then that means I understand it well enough. Maybe that's my attempt at that, maybe the outcome is to your benefit (the reader).

I will follow the style of talking from a Malicious Actor train of thought (Red bands):

No alt text provided for this image

And then from a Computer Security Specialist to prevent the Malicious Actor (Green bands):

No alt text provided for this image


Some Definitions

Malware: well that's short for Malicious Software, as in code that is written with the intent of performing malicious actions on a Victim's machine. (Yes we call laptops machines)

Malware Analysis: This is the study of malware's behaviour, to understand what a certain malware is intended to do, what malware family it belongs to, and eventually how to protect against it.

Static Malware Analysis: As opposed to Dynamic Malware Analysis (where you run the code (executable) in a protected environment to understand its behaviour), Static Malware Analysis is the study of this code (executable) without actually running it. This includes using many tools and research. You'd be surprised how much info can be pulled from an executable without running it.

Portable Executable (PE): A file format for Windows executable files, this includes (.exe, .dll, .sys, .drv, etc..)

Dynamic Link Library (DLL): A type of PE that Windows exports most of its functions (called Application Programming Interfaces APIs) in. You cannot run (execute) DLLs on their own, you basically call the APIs within a DLL from other executable ".exe" files.

Hash: a function that maps data of arbitrary size (can be a file) to a fixed-size value. In the below example I am showing the MD5 hash (a type of hash) of an executable "malware.exe", which is "60e29751634c36ca26fd6acef4d9554e"

aamarin@ubuntu:~/Desktop/Malware$ md5sum malware.exe
60e29751634c36ca26fd6acef4d9554e  malware.exe         


Simple Code Example

Below is a simple code written using Notepad in C language (high level language) to simply print the message "Hello Peeps" on the screen. Do not focus too much on the code if you don't understand every piece of it, that is fine:

No alt text provided for this image

As you might have noticed, this file is saved as "simple.c". To turn in into an executable (machine code), we will need to use a compiler. In my example I am using an open-source compiler called "Tiny C Compiler", and as you can see I am using it to create a "simple.exe" file out of the "simple.c" file:

C:\Users\aamarin\Desktop\tcc> tcc -o simple.exe simple.c        

And now this executable can be run to execute the code:

C:\Users\aamarin\Desktop\tcc> simple.exe
Hello Peeps        

What this code looks like in Assembly (low level language), which is closer to machine language is as below:

Address			Opcode	Instruction
L_00401030:		push	rbp
L_00401031:		mov		rbp, rsp
L_00401034:		sub		rsp, 0x50
L_0040103B:		mov		eax, 0x0
L_00401040:		mov		[rbp-0x4], eax
L_00401043:		mov		eax, 0x1
L_00401048:		mov		r10, rax
L_0040104B:		mov		rcx, r10
L_0040104E:		call	0x401190
...        

The above code is just the beginning part of this very simple code (in total it was actually 655 lines!). And this was disassembled using "CFF Explorer" tool.


PE Structure - 10,000 foot view

No alt text provided for this image

For more details: https://github.com/corkami/pics/blob/master/binary/pe101/pe101.pdf


Hashes don't lie

Now that we are done with the basics, let's get to the main subject. File hashes are signatures to identify files (or in our case Malware). And there are publicly available scanning services with anti-virus engines that carry out both static and dynamic malware analysis on uploaded files. One of the most well known ones is VirusTotal. That's a good point to start from; you can upload the file in question and get the result from tens of anti-virus engines whether it is considered benign or malicious. You can also search using the hash instead of uploading the actual file (this way you only get the result if this file has been previously uploaded to VirusTotal).

From our previous example, I searched for the hash "60e29751634c36ca26fd6acef4d9554e" and the result was 59/68 engines think this is a malicious file (a good indication that it actually is), try it yourself!

The reason you see here a different hash is just because they are using a different type of hash to represent it (SHA256 instead of MD5).

No alt text provided for this image


No alt text provided for this image

As a malicious actor, you can think that if you do any minor modification to your code, the hash would be different and you would get away from having your executable's hash identified as malware... and you'd be right!

Take my previous code example earlier and note here I added a line to identify an integer x (int x = 13;), which I am not using anywhere (makes no difference to the outcome of the code):

No alt text provided for this image

Yet now if we look at the hashes, they are completely different:

aamarin@ubuntu:~/Desktop/Malware/My-Samples$ md5sum simple.exe
9a48013f3ad1f1bf3779e512c899d7e7  simple.exe

aamarin@ubuntu:~/Desktop/Malware/My-Samples$ md5sum simple2.exe 
de1e86bf8e86f9afe6054fb0ceb76e34  simple2.exe         

Thus if my code was actual Malware, I would've been able to avoid detection through my hash signature.


No alt text provided for this image

Fuzzy Hashes to the rescue!

Fuzzy hashes are functions that can identify how similar two files are with a percentage, unlike normal hashes which only know black or white. This way if a malicious code had minor modifications, we'd be able to relate it to other malicious code which it was modified from.

A great tool for this is the Linux tool ssdeep. In the below example, we are looking at 3 different Malware from the same Malware family called "Greencat". As you can see the hashes for all three files are different (meaning none of them is an exact replica of the other):

aamarin@ubuntu:~/Desktop/Malware/Greencat$ md5sum sample*
e54ce5f0112c9fdfe86db17e85a5e2c5  sample1.exe
fab6b0b33d59f393e142000f128a9652  sample2.exe
f4ed3b7a8a58453052db4b5be3707342  sample3.exe        

Yet if we check the similarity between all three using ssdeep, we would notice that "sample2.exe" and "sample3.exe" are 99% similar (mostly a modified version of each other to avoid hash detection):

aamarin@ubuntu:~/Desktop/Malware/Greencat$ ssdeep -brp /home/aamarin/Desktop/Malware/Greencat/*
sample2.exe matches sample3.exe (99)
sample3.exe matches sample2.exe (99)        


Import Hash "Imphash" and Section Hashes

While we are on the subject of hashes, it's important to note that as part of static Malware analysis, we can compare hashes of not just the entire file, but there are hashes of the "Import" Section of the PE file, called "Imphash".

The reason why this is important as part of Static Malware Analysis is because most actions taken by a code (like writing to a file, changing a registry or opening a connection) are translated into API calls within DLLs, and these are included in the "Import" section of the PE file, thus making it a more personal signature of a PE file than a normal hash.

Looking back at our Greencat example where all 3 malware samples had different hashes, here using a tool called PeStudio to analyze the samples, we see that "sample2.exe" and "sample3.exe" have the same Imphash.

No alt text provided for this image

We also have hashes for the different sections in the "Sections Table" that can be compared. Again, as seen below "sample2.exe" and "sample3.exe" have the same MD5 hash for the sections ".text", ".rdata", and ".rsrc". They only have different hashes for the ".data" section:

No alt text provided for this image


Strings!

If you're familiar with coding then you know what strings are, if you are not then think words.. yes like actual words in language. That is one thing that we can analyze in a PE file, and look for certain words that can ring alarm bells, words that are "blacklisted".

We can use the same tool PeStudio for this purpose and check the strings within a PE file. PeStudio helps us with identifying words that can be a cause for concern.

In the below example the screenshot shows a long list of "blacklisted" words, but I am highlighting the ones that are obviously a cause for concern (creating a connection, editing registries and deleting/writing files):

No alt text provided for this image


No alt text provided for this image

Packers and Cryptors

So what would a malicious actor do in such a scenario? well they would want to hide the strings (words) from their executable, and they can do just that with Packers or Cryptors. A Packer is a program that hides an executable's content by compressing it. A Cryptor is a program that hides an executable's content by encrypting it.

An example of a Packer is the open-source tool UPX, and below I have created a packed file called "packed-malware-sample.exe" out of the previously analyzed sample:

C:\Users\aamarin\Desktop\Programs\upx-3.96-win64>upx -o packed-malware-sample.exe malware-sample.exe
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2020
UPX 3.96w       Markus Oberhumer, Laszlo Molnar & John Reiser   Jan 23rd 2020


        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
     44576 ->     20512   46.02%	win32/pe 	 packed-malware-sample.exe


Packed 1 file.        

Now if we analyze the "packed-malware-sample.exe" file with PeStudio, we will notice the number of "blacklisted" entries went down from (48) to just (5)!

No alt text provided for this image

Though you can see that PeStudio caught the UPX strings and now listed them as "Blacklisted" to help us.


No alt text provided for this image

As a Computer Security Specialist, what you can do is identify the packing to begin with. PeStudio has made that easy for us, but another great tool for this is Exeinfo, which as you can see below, it identified the packing method, and provided details on how to unpack it:

No alt text provided for this image


Identifying File Type

That might seem trivial at first, you might say I'll look at the extension of the file, for example if it's ".exe" then it's an executable, and if it's ".xlsx" then it's a Microsoft Excel. Or you might say that you can look at the icon of the file to identify what the file actually is.

No alt text provided for this image

From a Malicious actor's point of view, both of these elements can be manipulated to trick a user into thinking a file is for example a Microsoft Excel file, where in fact it is an executable running malicious code in the back.

The Windows "File name extensions" feature is by default disabled, enabling it you can see the true extension of a file:

No alt text provided for this image

A Malicious Actor can manipulate the extension to make it seem like a Microsoft Excel file:

No alt text provided for this image

Now with the "File name extensions" feature disabled, the file seems like a Microsoft Excel file:

No alt text provided for this image

A Malicious Actor can go the extra mile of using a tool like Resource Hacker to change the Icon of the file to make it more believable to be a Microsoft Excel file:

No alt text provided for this image

The end result would look something like this:

No alt text provided for this image

A Malicious Actor can even go further in manipulating the resources section so that an actual Microsoft Excel sheet would appear when the file is opened, while Malicious code runs in the background.

No alt text provided for this image

As seen in the previous example, there are ways to manipulate the extension of a file, but the final judge of whether the file is an executable as we mentioned earlier in the "PE Structure - 10,000 foot view" section, is the Magic Number at the beginning of the DOS Header; if it is the bytes {4D 5A}, which is equivalent to "MZ" in ASCII code (you can think of it as the human readable letter format). This can be analyzed using many of the previously mentioned tool.

Uploading the previous file to PeStudio, we see the Magic Number on the first window:

No alt text provided for this image

Uploading the previous file to CFF Explorer, we see the actual file type:

No alt text provided for this image


Final Words

As you can imagine there are more details to go into, but to keep the article brief (yes it could have been longer) and understandable (as I hope) I decided to stick to the main ideas and the simplest examples. My main reference of the above article is the book Learning Malware Analysis by Monnappa K A.

I hope you found it interesting as I do, and hope that you enjoy and understand my style of writing.

Kudos to you Ala'a! Since day one, your approach has always been simple, sharp, and straight to the point. From cybersecurity perspective, this was certainly beneficial. Keep this up & one day you'll author something just like Monnappa K A. or even better...

Thank you very much for the highly insightful article, you made it really simple and easy to understand these complex topics in cyber security. Your writing style is sharp, on point, well organized and most importantly intriguing. I honestly learned a lot from this, and I am looking forward to your next article. Much Love and Respect.

To view or add a comment, sign in

More articles by Ala'a Amarin

  • Firewalls

    Reading Level: Beginner. Although the internet is considered the biggest source of information these days, it is…

  • Network Foundation Protection

    Reading Level: Beginner. You have a network and you want to protect it, how do you go about doing that? Network…

Others also viewed

Explore content categories