PE Malware Static Analysis
Reading Level: Intermediate.
Any Computer Security enthusiasts out there? yes this subject can get extremely complicated, we both agree on that. But I was once told if I can explain something in simple words then that means I understand it well enough. Maybe that's my attempt at that, maybe the outcome is to your benefit (the reader).
I will follow the style of talking from a Malicious Actor train of thought (Red bands):
And then from a Computer Security Specialist to prevent the Malicious Actor (Green bands):
Some Definitions
Malware: well that's short for Malicious Software, as in code that is written with the intent of performing malicious actions on a Victim's machine. (Yes we call laptops machines)
Malware Analysis: This is the study of malware's behaviour, to understand what a certain malware is intended to do, what malware family it belongs to, and eventually how to protect against it.
Static Malware Analysis: As opposed to Dynamic Malware Analysis (where you run the code (executable) in a protected environment to understand its behaviour), Static Malware Analysis is the study of this code (executable) without actually running it. This includes using many tools and research. You'd be surprised how much info can be pulled from an executable without running it.
Portable Executable (PE): A file format for Windows executable files, this includes (.exe, .dll, .sys, .drv, etc..)
Dynamic Link Library (DLL): A type of PE that Windows exports most of its functions (called Application Programming Interfaces APIs) in. You cannot run (execute) DLLs on their own, you basically call the APIs within a DLL from other executable ".exe" files.
Hash: a function that maps data of arbitrary size (can be a file) to a fixed-size value. In the below example I am showing the MD5 hash (a type of hash) of an executable "malware.exe", which is "60e29751634c36ca26fd6acef4d9554e"
aamarin@ubuntu:~/Desktop/Malware$ md5sum malware.exe
60e29751634c36ca26fd6acef4d9554e malware.exe
Simple Code Example
Below is a simple code written using Notepad in C language (high level language) to simply print the message "Hello Peeps" on the screen. Do not focus too much on the code if you don't understand every piece of it, that is fine:
As you might have noticed, this file is saved as "simple.c". To turn in into an executable (machine code), we will need to use a compiler. In my example I am using an open-source compiler called "Tiny C Compiler", and as you can see I am using it to create a "simple.exe" file out of the "simple.c" file:
C:\Users\aamarin\Desktop\tcc> tcc -o simple.exe simple.c
And now this executable can be run to execute the code:
C:\Users\aamarin\Desktop\tcc> simple.exe
Hello Peeps
What this code looks like in Assembly (low level language), which is closer to machine language is as below:
Address Opcode Instruction
L_00401030: push rbp
L_00401031: mov rbp, rsp
L_00401034: sub rsp, 0x50
L_0040103B: mov eax, 0x0
L_00401040: mov [rbp-0x4], eax
L_00401043: mov eax, 0x1
L_00401048: mov r10, rax
L_0040104B: mov rcx, r10
L_0040104E: call 0x401190
...
The above code is just the beginning part of this very simple code (in total it was actually 655 lines!). And this was disassembled using "CFF Explorer" tool.
PE Structure - 10,000 foot view
For more details: https://github.com/corkami/pics/blob/master/binary/pe101/pe101.pdf
Hashes don't lie
Now that we are done with the basics, let's get to the main subject. File hashes are signatures to identify files (or in our case Malware). And there are publicly available scanning services with anti-virus engines that carry out both static and dynamic malware analysis on uploaded files. One of the most well known ones is VirusTotal. That's a good point to start from; you can upload the file in question and get the result from tens of anti-virus engines whether it is considered benign or malicious. You can also search using the hash instead of uploading the actual file (this way you only get the result if this file has been previously uploaded to VirusTotal).
From our previous example, I searched for the hash "60e29751634c36ca26fd6acef4d9554e" and the result was 59/68 engines think this is a malicious file (a good indication that it actually is), try it yourself!
The reason you see here a different hash is just because they are using a different type of hash to represent it (SHA256 instead of MD5).
As a malicious actor, you can think that if you do any minor modification to your code, the hash would be different and you would get away from having your executable's hash identified as malware... and you'd be right!
Take my previous code example earlier and note here I added a line to identify an integer x (int x = 13;), which I am not using anywhere (makes no difference to the outcome of the code):
Yet now if we look at the hashes, they are completely different:
aamarin@ubuntu:~/Desktop/Malware/My-Samples$ md5sum simple.exe
9a48013f3ad1f1bf3779e512c899d7e7 simple.exe
aamarin@ubuntu:~/Desktop/Malware/My-Samples$ md5sum simple2.exe
de1e86bf8e86f9afe6054fb0ceb76e34 simple2.exe
Thus if my code was actual Malware, I would've been able to avoid detection through my hash signature.
Fuzzy Hashes to the rescue!
Fuzzy hashes are functions that can identify how similar two files are with a percentage, unlike normal hashes which only know black or white. This way if a malicious code had minor modifications, we'd be able to relate it to other malicious code which it was modified from.
A great tool for this is the Linux tool ssdeep. In the below example, we are looking at 3 different Malware from the same Malware family called "Greencat". As you can see the hashes for all three files are different (meaning none of them is an exact replica of the other):
aamarin@ubuntu:~/Desktop/Malware/Greencat$ md5sum sample*
e54ce5f0112c9fdfe86db17e85a5e2c5 sample1.exe
fab6b0b33d59f393e142000f128a9652 sample2.exe
f4ed3b7a8a58453052db4b5be3707342 sample3.exe
Yet if we check the similarity between all three using ssdeep, we would notice that "sample2.exe" and "sample3.exe" are 99% similar (mostly a modified version of each other to avoid hash detection):
aamarin@ubuntu:~/Desktop/Malware/Greencat$ ssdeep -brp /home/aamarin/Desktop/Malware/Greencat/*
sample2.exe matches sample3.exe (99)
sample3.exe matches sample2.exe (99)
Recommended by LinkedIn
Import Hash "Imphash" and Section Hashes
While we are on the subject of hashes, it's important to note that as part of static Malware analysis, we can compare hashes of not just the entire file, but there are hashes of the "Import" Section of the PE file, called "Imphash".
The reason why this is important as part of Static Malware Analysis is because most actions taken by a code (like writing to a file, changing a registry or opening a connection) are translated into API calls within DLLs, and these are included in the "Import" section of the PE file, thus making it a more personal signature of a PE file than a normal hash.
Looking back at our Greencat example where all 3 malware samples had different hashes, here using a tool called PeStudio to analyze the samples, we see that "sample2.exe" and "sample3.exe" have the same Imphash.
We also have hashes for the different sections in the "Sections Table" that can be compared. Again, as seen below "sample2.exe" and "sample3.exe" have the same MD5 hash for the sections ".text", ".rdata", and ".rsrc". They only have different hashes for the ".data" section:
Strings!
If you're familiar with coding then you know what strings are, if you are not then think words.. yes like actual words in language. That is one thing that we can analyze in a PE file, and look for certain words that can ring alarm bells, words that are "blacklisted".
We can use the same tool PeStudio for this purpose and check the strings within a PE file. PeStudio helps us with identifying words that can be a cause for concern.
In the below example the screenshot shows a long list of "blacklisted" words, but I am highlighting the ones that are obviously a cause for concern (creating a connection, editing registries and deleting/writing files):
Packers and Cryptors
So what would a malicious actor do in such a scenario? well they would want to hide the strings (words) from their executable, and they can do just that with Packers or Cryptors. A Packer is a program that hides an executable's content by compressing it. A Cryptor is a program that hides an executable's content by encrypting it.
An example of a Packer is the open-source tool UPX, and below I have created a packed file called "packed-malware-sample.exe" out of the previously analyzed sample:
C:\Users\aamarin\Desktop\Programs\upx-3.96-win64>upx -o packed-malware-sample.exe malware-sample.exe
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2020
UPX 3.96w Markus Oberhumer, Laszlo Molnar & John Reiser Jan 23rd 2020
File size Ratio Format Name
-------------------- ------ ----------- -----------
44576 -> 20512 46.02% win32/pe packed-malware-sample.exe
Packed 1 file.
Now if we analyze the "packed-malware-sample.exe" file with PeStudio, we will notice the number of "blacklisted" entries went down from (48) to just (5)!
Though you can see that PeStudio caught the UPX strings and now listed them as "Blacklisted" to help us.
As a Computer Security Specialist, what you can do is identify the packing to begin with. PeStudio has made that easy for us, but another great tool for this is Exeinfo, which as you can see below, it identified the packing method, and provided details on how to unpack it:
Identifying File Type
That might seem trivial at first, you might say I'll look at the extension of the file, for example if it's ".exe" then it's an executable, and if it's ".xlsx" then it's a Microsoft Excel. Or you might say that you can look at the icon of the file to identify what the file actually is.
From a Malicious actor's point of view, both of these elements can be manipulated to trick a user into thinking a file is for example a Microsoft Excel file, where in fact it is an executable running malicious code in the back.
The Windows "File name extensions" feature is by default disabled, enabling it you can see the true extension of a file:
A Malicious Actor can manipulate the extension to make it seem like a Microsoft Excel file:
Now with the "File name extensions" feature disabled, the file seems like a Microsoft Excel file:
A Malicious Actor can go the extra mile of using a tool like Resource Hacker to change the Icon of the file to make it more believable to be a Microsoft Excel file:
The end result would look something like this:
A Malicious Actor can even go further in manipulating the resources section so that an actual Microsoft Excel sheet would appear when the file is opened, while Malicious code runs in the background.
As seen in the previous example, there are ways to manipulate the extension of a file, but the final judge of whether the file is an executable as we mentioned earlier in the "PE Structure - 10,000 foot view" section, is the Magic Number at the beginning of the DOS Header; if it is the bytes {4D 5A}, which is equivalent to "MZ" in ASCII code (you can think of it as the human readable letter format). This can be analyzed using many of the previously mentioned tool.
Uploading the previous file to PeStudio, we see the Magic Number on the first window:
Uploading the previous file to CFF Explorer, we see the actual file type:
Final Words
As you can imagine there are more details to go into, but to keep the article brief (yes it could have been longer) and understandable (as I hope) I decided to stick to the main ideas and the simplest examples. My main reference of the above article is the book Learning Malware Analysis by Monnappa K A.
I hope you found it interesting as I do, and hope that you enjoy and understand my style of writing.
Kudos to you Ala'a! Since day one, your approach has always been simple, sharp, and straight to the point. From cybersecurity perspective, this was certainly beneficial. Keep this up & one day you'll author something just like Monnappa K A. or even better...
Thank you very much for the highly insightful article, you made it really simple and easy to understand these complex topics in cyber security. Your writing style is sharp, on point, well organized and most importantly intriguing. I honestly learned a lot from this, and I am looking forward to your next article. Much Love and Respect.