PE Malware Static Analysis

Ala'a Amarin

Published Sep 12, 2021

Reading Level: Intermediate.

Any Computer Security enthusiasts out there? yes this subject can get extremely complicated, we both agree on that. But I was once told if I can explain something in simple words then that means I understand it well enough. Maybe that's my attempt at that, maybe the outcome is to your benefit (the reader).

I will follow the style of talking from a Malicious Actor train of thought (Red bands):

And then from a Computer Security Specialist to prevent the Malicious Actor (Green bands):

Some Definitions

Malware: well that's short for Malicious Software, as in code that is written with the intent of performing malicious actions on a Victim's machine. (Yes we call laptops machines)

Malware Analysis: This is the study of malware's behaviour, to understand what a certain malware is intended to do, what malware family it belongs to, and eventually how to protect against it.

Static Malware Analysis: As opposed to Dynamic Malware Analysis (where you run the code (executable) in a protected environment to understand its behaviour), Static Malware Analysis is the study of this code (executable) without actually running it. This includes using many tools and research. You'd be surprised how much info can be pulled from an executable without running it.

Portable Executable (PE): A file format for Windows executable files, this includes (.exe, .dll, .sys, .drv, etc..)

Dynamic Link Library (DLL): A type of PE that Windows exports most of its functions (called Application Programming Interfaces APIs) in. You cannot run (execute) DLLs on their own, you basically call the APIs within a DLL from other executable ".exe" files.

Hash: a function that maps data of arbitrary size (can be a file) to a fixed-size value. In the below example I am showing the MD5 hash (a type of hash) of an executable "malware.exe", which is "60e29751634c36ca26fd6acef4d9554e"

aamarin@ubuntu:~/Desktop/Malware$ md5sum malware.exe
60e29751634c36ca26fd6acef4d9554e  malware.exe

Simple Code Example

Below is a simple code written using Notepad in C language (high level language) to simply print the message "Hello Peeps" on the screen. Do not focus too much on the code if you don't understand every piece of it, that is fine:

As you might have noticed, this file is saved as "simple.c". To turn in into an executable (machine code), we will need to use a compiler. In my example I am using an open-source compiler called "Tiny C Compiler", and as you can see I am using it to create a "simple.exe" file out of the "simple.c" file:

C:\Users\aamarin\Desktop\tcc> tcc -o simple.exe simple.c

And now this executable can be run to execute the code:

C:\Users\aamarin\Desktop\tcc> simple.exe
Hello Peeps

What this code looks like in Assembly (low level language), which is closer to machine language is as below:

Address			Opcode	Instruction
L_00401030:		push	rbp
L_00401031:		mov		rbp, rsp
L_00401034:		sub		rsp, 0x50
L_0040103B:		mov		eax, 0x0
L_00401040:		mov		[rbp-0x4], eax
L_00401043:		mov		eax, 0x1
L_00401048:		mov		r10, rax
L_0040104B:		mov		rcx, r10
L_0040104E:		call	0x401190
...

The above code is just the beginning part of this very simple code (in total it was actually 655 lines!). And this was disassembled using "CFF Explorer" tool.

PE Structure - 10,000 foot view

For more details: https://github.com/corkami/pics/blob/master/binary/pe101/pe101.pdf

Hashes don't lie

Now that we are done with the basics, let's get to the main subject. File hashes are signatures to identify files (or in our case Malware). And there are publicly available scanning services with anti-virus engines that carry out both static and dynamic malware analysis on uploaded files. One of the most well known ones is VirusTotal. That's a good point to start from; you can upload the file in question and get the result from tens of anti-virus engines whether it is considered benign or malicious. You can also search using the hash instead of uploading the actual file (this way you only get the result if this file has been previously uploaded to VirusTotal).

From our previous example, I searched for the hash "60e29751634c36ca26fd6acef4d9554e" and the result was 59/68 engines think this is a malicious file (a good indication that it actually is), try it yourself!

The reason you see here a different hash is just because they are using a different type of hash to represent it (SHA256 instead of MD5).

As a malicious actor, you can think that if you do any minor modification to your code, the hash would be different and you would get away from having your executable's hash identified as malware... and you'd be right!

Take my previous code example earlier and note here I added a line to identify an integer x (int x = 13;), which I am not using anywhere (makes no difference to the outcome of the code):

Yet now if we look at the hashes, they are completely different:

aamarin@ubuntu:~/Desktop/Malware/My-Samples$ md5sum simple.exe
9a48013f3ad1f1bf3779e512c899d7e7  simple.exe

aamarin@ubuntu:~/Desktop/Malware/My-Samples$ md5sum simple2.exe 
de1e86bf8e86f9afe6054fb0ceb76e34  simple2.exe

Thus if my code was actual Malware, I would've been able to avoid detection through my hash signature.

Fuzzy Hashes to the rescue!

Fuzzy hashes are functions that can identify how similar two files are with a percentage, unlike normal hashes which only know black or white. This way if a malicious code had minor modifications, we'd be able to relate it to other malicious code which it was modified from.

A great tool for this is the Linux tool ssdeep. In the below example, we are looking at 3 different Malware from the same Malware family called "Greencat". As you can see the hashes for all three files are different (meaning none of them is an exact replica of the other):

aamarin@ubuntu:~/Desktop/Malware/Greencat$ md5sum sample*
e54ce5f0112c9fdfe86db17e85a5e2c5  sample1.exe
fab6b0b33d59f393e142000f128a9652  sample2.exe
f4ed3b7a8a58453052db4b5be3707342  sample3.exe

Yet if we check the similarity between all three using ssdeep, we would notice that "sample2.exe" and "sample3.exe" are 99% similar (mostly a modified version of each other to avoid hash detection):

aamarin@ubuntu:~/Desktop/Malware/Greencat$ ssdeep -brp /home/aamarin/Desktop/Malware/Greencat/*
sample2.exe matches sample3.exe (99)
sample3.exe matches sample2.exe (99)

Strings!

If you're familiar with coding then you know what strings are, if you are not then think words.. yes like actual words in language. That is one thing that we can analyze in a PE file, and look for certain words that can ring alarm bells, words that are "blacklisted".

We can use the same tool PeStudio for this purpose and check the strings within a PE file. PeStudio helps us with identifying words that can be a cause for concern.

In the below example the screenshot shows a long list of "blacklisted" words, but I am highlighting the ones that are obviously a cause for concern (creating a connection, editing registries and deleting/writing files):

Packers and Cryptors

So what would a malicious actor do in such a scenario? well they would want to hide the strings (words) from their executable, and they can do just that with Packers or Cryptors. A Packer is a program that hides an executable's content by compressing it. A Cryptor is a program that hides an executable's content by encrypting it.

An example of a Packer is the open-source tool UPX, and below I have created a packed file called "packed-malware-sample.exe" out of the previously analyzed sample:

C:\Users\aamarin\Desktop\Programs\upx-3.96-win64>upx -o packed-malware-sample.exe malware-sample.exe
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2020
UPX 3.96w       Markus Oberhumer, Laszlo Molnar & John Reiser   Jan 23rd 2020


        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
     44576 ->     20512   46.02%	win32/pe 	 packed-malware-sample.exe


Packed 1 file.

Now if we analyze the "packed-malware-sample.exe" file with PeStudio, we will notice the number of "blacklisted" entries went down from (48) to just (5)!

Though you can see that PeStudio caught the UPX strings and now listed them as "Blacklisted" to help us.

As a Computer Security Specialist, what you can do is identify the packing to begin with. PeStudio has made that easy for us, but another great tool for this is Exeinfo, which as you can see below, it identified the packing method, and provided details on how to unpack it:

Identifying File Type

That might seem trivial at first, you might say I'll look at the extension of the file, for example if it's ".exe" then it's an executable, and if it's ".xlsx" then it's a Microsoft Excel. Or you might say that you can look at the icon of the file to identify what the file actually is.

From a Malicious actor's point of view, both of these elements can be manipulated to trick a user into thinking a file is for example a Microsoft Excel file, where in fact it is an executable running malicious code in the back.

The Windows "File name extensions" feature is by default disabled, enabling it you can see the true extension of a file:

A Malicious Actor can manipulate the extension to make it seem like a Microsoft Excel file:

Now with the "File name extensions" feature disabled, the file seems like a Microsoft Excel file:

A Malicious Actor can go the extra mile of using a tool like Resource Hacker to change the Icon of the file to make it more believable to be a Microsoft Excel file:

The end result would look something like this:

A Malicious Actor can even go further in manipulating the resources section so that an actual Microsoft Excel sheet would appear when the file is opened, while Malicious code runs in the background.

As seen in the previous example, there are ways to manipulate the extension of a file, but the final judge of whether the file is an executable as we mentioned earlier in the "PE Structure - 10,000 foot view" section, is the Magic Number at the beginning of the DOS Header; if it is the bytes {4D 5A}, which is equivalent to "MZ" in ASCII code (you can think of it as the human readable letter format). This can be analyzed using many of the previously mentioned tool.

Uploading the previous file to PeStudio, we see the Magic Number on the first window:

Uploading the previous file to CFF Explorer, we see the actual file type:

Final Words

As you can imagine there are more details to go into, but to keep the article brief (yes it could have been longer) and understandable (as I hope) I decided to stick to the main ideas and the simplest examples. My main reference of the above article is the book Learning Malware Analysis by Monnappa K A.

I hope you found it interesting as I do, and hope that you enjoy and understand my style of writing.

Abdalla Al Hnaiti 4y

Kudos to you Ala'a! Since day one, your approach has always been simple, sharp, and straight to the point. From cybersecurity perspective, this was certainly beneficial. Keep this up & one day you'll author something just like Monnappa K A. or even better...

1 Reaction

Mohammed Alkhatib 4y

Thank you very much for the highly insightful article, you made it really simple and easy to understand these complex topics in cyber security. Your writing style is sharp, on point, well organized and most importantly intriguing. I honestly learned a lot from this, and I am looking forward to your next article. Much Love and Respect.

PE Malware Static Analysis

Ala'a Amarin

Some Definitions

Simple Code Example

PE Structure - 10,000 foot view

Hashes don't lie

Fuzzy Hashes to the rescue!

Recommended by LinkedIn

Import Hash "Imphash" and Section Hashes

Strings!

Packers and Cryptors

Identifying File Type

Final Words

More articles by Ala'a Amarin

Others also viewed

AI Malware Diaries - LAMEHUG: LLM-based malware (UAC-0001 (APT28))

Detecting 'Paste and Run' malware with KQL

Malware Analysis Part 3. Agent Tesla

Malware Analysis Adventures: an Agent Tesla variant

Static Malware Examination

Analyzing random malware from YouTube

Ducex - The Most Advanced Android Triada Malware - Technical Analysis

Understanding YARA Rules: The Detective Tool for Catching Malware

"Analyzing OneNote Malware and Uncovering Secret File"

Malware Packers Analysis

Explore content categories

Some Definitions

Simple Code Example

PE Structure - 10,000 foot view

Hashes don't lie

Fuzzy Hashes to the rescue!

Recommended by LinkedIn

Import Hash "Imphash" and Section Hashes

Strings!

Packers and Cryptors

Identifying File Type

Final Words

More articles by Ala'a Amarin

Firewalls

Network Foundation Protection

Others also viewed

AI Malware Diaries - LAMEHUG: LLM-based malware (UAC-0001 (APT28))

Detecting 'Paste and Run' malware with KQL

Malware Analysis Part 3. Agent Tesla

Malware Analysis Adventures: an Agent Tesla variant

Static Malware Examination

Analyzing random malware from YouTube

Ducex - The Most Advanced Android Triada Malware - Technical Analysis

Understanding YARA Rules: The Detective Tool for Catching Malware

"Analyzing OneNote Malware and Uncovering Secret File"

Malware Packers Analysis

Similar topics

Malware Analysis Processes

Explore content categories