Software Reverse Engineering(?!) - Basics
Reverse engineering is a very interesting topic in software community. It is simply a process of disassembling or breaking down or debugging a fully functioning component(can be software, a program, a firmware etc.) to its lower form and analyzing how things are built or how it works.
Why to break or analyse a software?
The need depends. A software developer could dig deep to the lowest level of the code and analyse why a bug exists. A security analyst could use reverse engineering to find security flaws and vulnerabilities. Or simply someone is curious to know how the system works from different perspective.
Disclaimer:
This article is purely for educational purpose only and not to be used for any malicious practices.I don't take any liability for any damages caused due to unethical practices.
Myth:
Its always misunderstood or mistaken that this field is intended for malicious activities. Sometimes there are instances where reverse engineering is used to stop unlawful software attacks like malware spread. For example, the recent Wannacry malware outbreak(The Eternal blue exploit) was stopped by reverse engineering experts who found a way through to the kill switch in malware code and even there were decryptors released to decrypt the files infected by malware. So, like any other field of engineering, it always depends on individuals to make things productive or unproductive.
Math: (Ugh!)
We've all heard that computers stores and process data in binaries i.e 0s and 1s. So, lets have a quick recap about few important number systems.
- Decimal number system is the commonly used number system. It has base 10 i.e it uses 10 digits from 0 to 9.
- Hexadecimal number system is a base 16 number system. It has values ranging from 0 to 15, where 10 to 15 values is denoted by symbols A to F.
- Binary number system has just two digits i.e 0 and 1 with base 2. This system is used in computers because the electrical signal is of two states i.e on and off, which could be the easier way to store and process data.
Each digit in binary system is a 'Bit', 8 bits is called a byte, 1024 bytes is called a Kilobyte and so on.
Below sample chart shows the conversion between each number system. Although computer stores and processes in binary, we can use these conversion methods in order to see the data in other number systems like hex/Decimal/Octal as per our need.
ASCII(American standard code for Information Interchange) is a very common character encoding standard used to represent alphabet characters,numbers,special symbols which has pre-defined binary representation for the values. Below image shows the ASCII characters and its corresponding binary values.
Why are we looking at these numbers systems and conversions?
In this article we are going to use number systems at some places to understand how computer stores or processes data at its lowest form. Also conversions helps us understand similar values represented differently across different number systems/encodings.
Tools and commands:
In this article, we will use few frequently used reverse engineer tools for analysis. Basis on the need and requirement, the tools or commands may differ. Below are the used tools along with some basic explanation of these tools.
- Hexdump : Displays data in hexadecimal view and also has options to display ASCII characters.
- Strings: Command used to display all readable strings in a binary file.
- Apktool: Tool to reverse Android's APK file almost to its original form and also used to rebuild it.
- Ghidra: A very powerful reverse engineering tool/framework which was released as open source software in 2019 by US's intelligence agency NSA. Its a very intuitive, easy to use tool with excellent GUI and good analytical engine. This tool is a blessing for reverse engineers, because the counterpart softwares with this kind of framework costs thousands of dollars. Tool can be downloaded from https://ghidra-sre.org and code is available at https://github.com/NationalSecurityAgency/ghidra
Let's get to work:
We will cover only breaking and analyzing the binaries/software. Modifying the binaries/code is not covered here. Will try to cover this in different article later.
1.Simple text file:
An example below has few characters present in a text file.
Lets see how its interpreted in hex values. Here we use hexdump to view hexadecimal values. The data below in red block represents the memory offset and the data in corresponding green block shows the hexadecimal data values stored in the offset. The parameter "-C" gives data in canonical hex(Data in green block) along with the ASCII data, which you can see in the third part of the output below. You can also paste the hex values of green block in an online hex to string converter, which will show the same ASCII characters displayed.
Lets see some basic reversing examples on some executables/softwares.
2.Simple C program with hardcoded password:
Below C program has a hardcoded password in the code which is not a good practice in software development. The program compares the user's input against this hardcoded password and if it matches "Password is correct" text is displayed. Lets see how easy its to get credentials from such codes using Hexdump and Ghidra.
To make a C program executable, save the file with '.c' extension and use compiler 'gcc' to compile and make executable binary. Below command compiles C code 'test.c' to 'executable.out' binary.
gcc test.c -o executable.out
Lets use 'file' command on this binary to see the type of this file.
file executable.out
The file type is "ELF"(Executable and linkable format) which is similar to that of ".EXE" in Windows.
Lets use "hexdump" and "Strings" command on the executable.
hexdump -C executable.out <<Gives output in canonical Hex values and ASCII>> strings executable.out <<Displays available strings in the binary>>
The binaries always have the file type/signatures in its first values. Here you can see the ASCII text "ELF" which denotes the file format.
On both the outputs you can see the text "GRIM" which is followed by the strings "Password is correct" and "Wrong password". So, this gives out an idea for attacker to brute-force with the string "GRIM" as password.
So, lets try executing the binary with password input "GRIM". You can see below that this password is correct.
Lets use Ghidra. As already said GHIDRA is a complete framework which has many features like disassembler,decompiler, function trees etc., Import the file in Ghidra and analyse the binary in code browser.
On analysis, you can see the decompiled code which is almost same as that of original source code. See the highlighted term where string compare(strcmp) is done against hardcoded password "GRIM" . Also you can see the low level assembly code(also referred as closest machine level code readable by humans) of this executable, which gives an idea how the CPU will execute the code at CPU's register/stack level.
3. APK file reverse engineering:
Android package or Android app package or simply APK is a compressed file consisting executables for a Android device. Every APK usually contains three main components as
- AndroidManifest.xml --> Contains information about the package and details about the services, activities, permissions etc.,
- Dalvik executable or DEX file --> This is the file which contains all the java classes obfuscated into single or few binary files.
- Application resource file or ARSC file --> Contains compiled resources in binary format.
In my previous article i've created a malicious apk for reverse TCP. Lets see the contents and try to get source of that file.
Link to my previous article --> https://www.garudax.id/pulse/simple-android-backdoor-through-reverse-tcp-using-msf-nanda-kumar/
Contents in the APK is in the first image below and lets use "apktool" to decompile the package into its original form.
GRIM@kali:~/Desktop$ apktool d payload.apk Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true I: Using Apktool 2.4.1-dirty on payload.apk I: Loading resource table... I: Decoding AndroidManifest.xml with resources... I: Loading resource table from file: /home/grim/.local/share/apktool/framework/1.apk I: Regular manifest package... I: Decoding file-resources... I: Decoding values */* XMLs... I: Baksmaling classes.dex... I: Copying assets and libs... I: Copying unknown files... I: Copying original files... GRIM@kali:~/Desktop$
Please note, this tool converts the class files from DEX to "SMALI" codes and not to jar or java codes. In order to convert to jar, other tools like "dex2jar" or "enjarify" etc., You can see the decompiled classes and resources of the APK below
Lets use Ghidra to analyse and try to decompile the APK to its original code. First image below shows the code of a main function of a class and second image shows the function call graph i.e visual display of dependency between functions. You can also see all the functions through Ghidra.