Encryption vs Encoding with Mobile Considerations
The Evolution of Encryption and Encoding
Kevin Barr
Introduction:
The language of computation is numbers. From the lowest levels of binary 1s and 0s to the highest level of application data storage and outputs, has no doubt, had a significant impact in shaping the modern economy of the 21rst century. Companies and organizations now rely on the power of data and its many uses. Perhaps more so than ever, data and its relationship to information technology have created many challenges for organizations around the concept of security facing threats from hackers and malicious actors operating in dynamic cyberspace.
Before the advent of computers, mathematics had long been interested in the topic of data transformations through representing numbers in different ways. The idea of manipulating numbers into various forms using standardized procedures can be traced back as early as Babylonia 1800 B.C., which historians agree became the onus for creation of Algebra (Coolman, 2016). The concept of utilizing mathematics to hide or secure messages from these transformations eventually broke off into the sub-discipline of cryptography or “a method of protecting information and communications through the use of codes so that only those for whom the information is intended can read and process it” (Rouse, 2018).
The Roman empire famously credited for the creation of the Caesar Cipher which involved “shifting the letters of an encrypted message by a certain number of places down the Latin alphabet. Knowing this system and the number of places to shift the letters, a recipient could successfully decode the otherwise illegible message” ("History of Cryptography | Binance Academy", 2019). The historical importance of cryptography became paramount in the turn of the 20th century with direct examples in warfare and espionage. During World War II Allied forces focused many efforts into understanding the German Enigma encoding machine that consisted of three rotors, rings, and, plugboards capable of generating billions of combinations used to relay messages to Nazi forces (Gillogly, 1995). Alan Turing a British Mathematician, using the help of Polish and French intelligence was eventually able to replicate the Enigma machine and decipher intercepted Nazi messages marking a significant turning point for Allied forces (Callahan, 2013).
Today, computer science has found many applications for encryption and encoding through the same concepts explored throughout history. They are ranging in application from general computing and networking to data security. While also providing hackers and bad actors tools for committing cybercrime. The purpose of this paper is to focus on defining the differences between encryption and encoding. Then will give examples on encoding, and encryption algorithms which will be analyzed through a historical perspective commenting on the practical applications and relevant concepts. This paper will also discuss on the use of encryption and encoding/obfuscation use in malware through studies of real-life events. Finally, there will be a section connecting these concepts to mobile forensics. Mobile forensics is considered a new field in information security and by examining the use of encryption and encoding in mobile devices the reader will gain insight as to future trends that will need to be addressed with proper consideration.
Operational Definitions – Encoding vs Encryption vs Obfuscation
Encoding, encryption, and obfuscation can be thought of as similar in the sense that they transform data from its original form to something that is obscured. Since organizations often confound the concept of encoding verse encryption, improper use of encoding or encryption may lead to vulnerabilities in security postures (Cremen, 2018). Defining operational definitions to be used throughout this paper is, therefore, the first step in understanding these differences and their immediate application in computing and security.
The fundamental mechanism of obscuring any data is through the use of ciphers. A cipher is a type of algorithm or mathematical operation that can be used to alter data in a standardized fashion. There are many different types of cipher algorithms based on the complexity of how they transform the data and their associated mechanisms and computational requirements. Encoding is the most straightforward concept of three and involves a data transformation to make sure different computing systems can adequately understand the data. Encoding includes publicly available ciphers that are reversible. Encoded data does not have any direct security implications and is mainly used for file compression. Whereas, encryption involves restricting access to reversing the algorithm by requiring parties to use a key to decrypt the data. Keys may either be public or private, but the central concept is that they limit access to the who can decrypt the data. Encryption, therefore, has many applications in security by providing authentication, integrity, and non-repudiation to data (Lord, 2019). Finally, obfuscation is the process of obscuring data to make it harder to understand. Obfuscation is similar to encoding as it is also reversible through publicly available ciphers, but its use is to protect data through obscurity. Engineers will obfuscate code to thwart attempts at reverse engineering software code, but it is limited by how obscure the code can become before it is no longer operational (Miessler, 2019).
For all the beneficial application, all three concepts have different uses in context to malware. Encoding may be utilized in combination with some social engineering scheme to trick users into clicking malicious links or compressing malicious signatures within a payload while hackers take advantage of encryption algorithms to steal data and hold keys for ransom, i.e., ransomware. Additionally, advanced malware such as polymorphic malware mechanisms relies on encryption/decrypting to hide from detection systems. Obfuscating malware signatures is an additional tactic hacker will employ to protect from detection systems.
Encoding Examples
As stated encoding ciphers are available to the public, which makes them easily reversible. The purpose of encoding data is to transform it so that plaintext may be understood in different computing circumstances. The following section will go over some of the more common examples of encoding ciphers in order to gain a fundamental understanding of common encoding ciphers and how they differ from one another.
ACISS
The American Standard Code for Information Exchange (ACISS) “is a type of character-encoding that is used for computers to store and retrieve characters (letters, numbers, symbols, spaces, indentations, etc) as bit-patterns for storage in memory and on hard drives” (Riecken, 2019). Perhaps the simplest and most utilized encoding process is the ACISS which is used by standard computers in the Western world to translate binary information into plaintext including up to 128 characters using a 7-bit schema during the early days of computers back in the 1970's. Since modern computers use 8-bits ACISS was eventually replaced by UNICODE encoding which uses the same concepts as ACISS but in an 8-bit format. Eventually, more languages such as Chinese and Arabic characters were also incorporated into the UTF-8 encode which has the most world-wide applications. Today, ACISS calculators are available on the web to translate code back into plain text or a starting point for encoding data using different cipher algorithms.
URL Encoding
URL Encoding or Percent-Encoding is a technique with direct application for transmitting URL addresses over the internet. Since ACISS characters sometimes do not fit within the rules of HTML a new means of easily translating this data into a different form (through encoding) was created. For example, URL cannot have spaces or certain special characters, using URL encoding will replaces these characters with underscores or percent symbols effectively converting ACISS into plausible URL addresses (Sigh, 2018). This is particularly helpful for end-users and DNS servers.
Base64
The final example of encoding is the use of Base64, which was created to transform binary data into radix 64 or base64 code, which is readable in ACISS characters. This encoding mechanism works by "binary data into Radix 64, data is parsed in 6-bit blocks (i.e. such that each block has a maximum value of 64), and the number represented by each 6-bit block is used to look up a Radix 64 character” (Egan, 2018). The main function of base64 is to allow binary data to be transmitted via emails or message exchanges that and then for the receiver to take that data and decipher it and use the binary data in its original format.
Base64 has many modern applications in networking; however, its malicious applications and their implications far outweigh its beneficial uses (Fiscus, 2011). Organizations have not yet recognized the security risk of utilizing Base64 as there are many ways of recognizing Base64 schemes. Often hackers take advantage of Base64 everyday uses by businesses. For example, if passwords are passed to a web authentication server using Base64 a hacker could search for a certain number of characters look for regular expressions consistent with base64 within the feed and run a Base64 calculator to expose those passwords. Additionally, hackers can also use Base64 to obfuscation plain text malware signatures to bypass security detection systems, either AV or anti-spam detection. Hacker’s use Base64 as a means of hiding their true intentions.
Encryption:
Compared to the reversible and public nature of encoding, encryption requires a key to reverse the data back into its original form. The extra step of using a key adds a critical layer of security consistent with principals of the CIA triad. There are many types of algorithms, protocols, and applications for encryption. This paper will provide an explanation of operational concepts for encryption, standard encryption algorithms including DES, AES, and RSA brief history of their use, a description of their algorithm, and evolution in information systems used to this day.
Operational Concepts for Encryption:
As stated, the key provides the mathematical operator that allows the data to be transformed back its original form, which also means it is used to encrypt the data in the first place. The method in which the key is made available can either be publicly available or privately, and its implementation can also be either symmetrically or asymmetrically. Understanding these concepts is important for direct encryption application and the surrounding context for its implementation.
The encryption key can either be public or private. A public key is made available to the public and is developed by a trusted authority known as a Certificate Authority. Since all parties have access to the key, only recipients of the encrypted data will be able to decrypt it with the public key whereas a private key or secret key is not available to the public. The private key is held by only the party that will decrypt the data and is utilized exclusively by symmetrical encryption schemes, “the entities communicating via symmetric encryption must exchange the key so that it can be used in the decryption process” (Smirnoff & Turner, 2019). While asymmetric encryption uses both public and private keys and most often implemented for creating digital signatures and various secure protocols such as SSH or SSL/TLS. In asymmetric encryption schemes, "Either of the keys can be used to encrypt a message; the opposite key from the one used to encrypt the message is used for decryption” (Rouse, 2019).
DES
Data encryption standard (DES) is a block cipher encryption algorithm developed in the 1970’s by the National Institute of Standards and Technology (NIST) using IBM’s researcher Horst Feistel’s Lucifer algorithm (Simmons, 2017). DES was used to secure data all electronic financial transactions by the US Government and eventually became the world’s standard for encryption. DES historical importance because it was the first publicly disclosed algorithm by the US Government for public use (Rouse, 2014). This action was, of course before computing became so wide-spread, but before these encryptions algorithms were kept secret this policy shift was to spread information to the public who would also be interested in security data. The DES algorithm uses a product block ciphers in a 16 round cascading transformation and substitution mechanism which in turns encrypts the data using 64 bits with 8 of the bits performing redundant checking. DES was obsolete by 1999 due to its vulnerability to brute force attacks and was shown to be vulnerable to parallel computing power over the internet (Simmons, 2017).
AES
Interestingly enough DES was never broken in terms of cryptanalysis; rather it was vulnerable to brute force by sheer computing power meaning that 64-bit algorithm was not powerful enough. The advanced encryption standard sought to use 128, 192, or 256-bit algorithm to encrypt/decrypt data. To brute force attack AES it would require “checking each of the 2128 possible key values (a “brute force” attack) is so computationally intensive that even the fastest supercomputer would require, on average, more than 100 trillion years to do it. In fact, AES has never been cracked, and based on current technological trends, is expected to remain secure for years to come” (Franklin, 2019).
AES has many properties that made it NIST next choice in naming it the gold standard of Government level encryption due to its security and balance of implementation compared to four other candidate algorithms. AES uses the Rijndael algorithm created by Joan Daemen and Vincent Rijmen and designed AES following the criteria, including resistance against all known attacks, spend and compactness, and simplicity ("Rijindael”, 2000). The choice of AES over other known encryption algorithms such as Twofish which is slightly more secure, theoretically, was controversial at the time but since NIST discloses these algorithms to the public, they wanted something that was easily understood. AES encryption works by breaking data up into blocks, then performs a key expansion, adds another round of keys using XOR cipher, then substitutes the bites, shifts rows, mix columns then gets another round of keys from a schedule which is a reference to know how the data is re-arranged (Franklin, 2019). The process just described is just one round of many depending on how many bits AES size is chosen. Today AES is primarily used to secure classified information; it is also used in many instances of software/hardware encryption.
RSA
The Rivest–Shamir–Adleman (RSA) algorithm is the first example of an asymmetric encryption algorithm meaning that data encrypted using RSA generates both public and private key. These keys are mathematically related to each other, and therefore, both keys must be in possession of the user before data can be decrypted. One of the key problems with symmetric encryption is that if the private key is stolen in a man-in-the-middle attack or through eavesdropping a third-party can gain access to the information using the stolen key, which was the onus for creating asymmetric encryption for tasks such as VPN sign-ons or digital signatures. By adding a second key, asymmetric encryption has benefits such as adding a layer of non-reputability which removes the ability for a party to deny accessing the data.
The RSA algorithm mechanism works using number theory based off of prime factors creating what is known as a trapdoor in combination with the Diffie-Hellman key exchange method. Essentially, breaking very large prime numbers down to its elements takes a lot of computing power while the Diffe-Hellman key exchange method allows for secure exchange of keys over public forums. RSA encryption works by first generating the keys via a random number generator, generating two large prime numbers, these numbers create a public and private key based on length of the prime numbers which is also related to the bit size of the encryption. Since the public key and private key are calculated using prime numbers, they become mathematically entangled and can only be used to decrypted data when both keys are present. Then the keys are distributed by having two parties exchange the public and private keys in a way that requires the private key sent to the party with the public key after the public key is accepted by the party that wishes to send information using RSA. At this point a padding scheme is required to add an additional layer of security to the data – a particular padding scheme must be agreed on by both parties so that they know how the data is re-arranged, then the party that wants to send information will create a ciphertext, and the message is sent. Decryption involves taking the ciphertext and padding scheme and then applying the private key to the data which will reverse the data back into its original form. Since RSA encryption involves generating two keys it is considerably more complicated to understand then the symmetric examples give above, yet “RSA encryption is the most widely used asymmetric encryption method in the world because of its ability to provide a high level of encryption with no known algorithm existing yet to be able to solve it” (Curran, 2018).
Examples in Malware
As discussed encryption, encoding, and obfuscation are all means of transforming data for a variety of reasons. For example, encryption is very useful in securing or hiding data. Hackers also want to perform these tasks but for different outcomes and goals. There are many scenarios in which a hacker would want to encrypt data, or obfuscate code, or encode URLs as means of performing criminal activities. Hackers are smart and creative, and they will use any tool as a weapon in their arsenal. Often hackers will employ these tactics to hide or evade detection as a means to execute a malicious payload bypassing security measures. The following sections will go over examples of how hackers also want to encrypt, encode, and obfuscate data through the study of polymorphic malware, ransomware, and DDoS attacks.
Polymorphic Malware
Polymorphic malware is the first example of malware that uses encryption to hide from conventional network security equipment. It does not matter the type of payload used by the hacker; instead it is the polymorphic shell which is used to disguise payloads through the polymorphic mechanisms (Rankin, 2018). Polymorphic malware consists of an encrypted payload and a mutation engine (Stevenson, 2018). Once a computer has become infected by polymorphic malware, the payload becomes decrypted, and then the mutation engine generates a new ‘decryption routine' which morphs the associated files and directory where they are saved. By this action, polymorphic malware effectively avoids conventional antivirus tools because polymorphic code inherently changes its own signatures. Antivirus software works by scanning computer files, and if the polymorphic code alters the files, they are not able to detect the malware.
A famous example of polymorphic malware is the Emotet Trojan, which is a banking malware discovered in 2015. Basically, the Trojan spread through email campaigns using social engineering tactics to get users to download the payload to their computer. Once infected, the spy software tried to steal banking information from the victim's computers through keyloggers, and various other tactics. Emotet worked similarly to many other types of malware; however, its polymorphic shell made it extremely difficult to detect and mitigate. In fact, "Emotet knows if it’s running inside a virtual machine (VM) and will lay dormant if it detects a sandbox environment” (“Emotet Malware”, 2019). Like all Trojans, they send information stolen back to the hackers via a command and control (C2) server. Any messages sent over HTTP are often encoded to distract administrators from catching the fact that data is leaving the network.
Ransomware
Ransomware is the second example of malware that uses encryption. Its fundamental mechanism is to encrypt the victim's data and prevent access to those files until a ransom is paid. Essentially, the victim is paying for an encryption key. There are many types of ransomware, and it is quickly becoming one of the most frequently discovered malware as it has a relatively high success rate. Considering also, that crypto-currencies provide another layer of anonymity as the payments during a crypto malware attack are harder to trace than traditional online transactions. The public first became aware of ransomware in early 2000’s where weak RSA algorithms held victims folders ransom then in “2013, CryptoLocker ransomware used military grade encryption and stored the key on a remote server to unlock encryption. It thus turned out to be virtually impossible for victims to get their data back without making the payment” (EC-Council University, 2019). As researchers continue to discover and publish new algorithms, so will the use of these algorithms end up in the hands of ransomware hackers.
BotNets/DDoS attacks
Encoding is another tactic that hackers can employ to hid from detection systems that look for certain signatures or codes in plaintext. In Distributed Denial of Distribution attacks, hackers use coordinated attacks on targets systems through zombifying devices with malware and creating an army of devices which will listen to the instructions of the C2 server to direct its resources at the target disrupting an organization's services. As a means of hiding Base64 in mobile devices is its part in BotNet malware. BotNets are zombified mobile or IoT that have been affected by malware. When not directly participating in a DDoS attack, BotNets require instructions and updates from a Command and Control (C2) server owned by the hackers. The instructions given to the BotNets from the C2 server uses encoded HTTP messages. Base64 is used to throw off IDS/IPS detection by obscuring these messages or at the very least, making them hard to detect. IoT devices that are targeted by zombifying DDoS malware frequently target mobile devices and other ‘smart devices’.
Considerations in Mobile Forensics
Mobile forensics is in an interesting position as it not only is it a newer discipline it also is having to consider how to break encryptions, bypass security, and parse through phone files as a means to investigate possible crimes. In this sense, mobile forensics is almost working against the gain as smart device manufacturers are all trying to protect the security of their end-users and do not readily disclose how to break into their phones. For example, Apple iOS is notorious for its closed OS and use of strong encryption. The encryption used to encrypt iOS backup files (there is an option to store files unencrypted) uses AES256 algorithms with CBC mode using unique keys and null initiation vectors (Mahalik., Tamma. & Bommisetty, 2016). The class keys or encryption keys are stored using PBKDF2 Password-Based Key Derivations Function 2 using 10,000 iterations. Forensic tools such as Elcomsoft Phone Breaker will use graphics processing to decrypt encrypted backup files through brute-force attacks to recover passwords in plaintexts, complex passwords still might not be possible to crack using this toolkit. Mobile forensics. Therefore, it requires a lot of specialized tool knowledge to break into the phones to perform their investigations.
Mobile forensics analyst also must understand the importance of encoding for mobile devices. Since encoding helps compress files, which most mobile devices are often being limited by storage will see lots of videos, and files encoded to save space. It is reasonable to find lots of encoded files on a mobile device during an investigation. It would be necessary for the forensic analyst to document how the data was discovered and use the right decoding tools in order to study the files of interest.
Finally, the mobile application (apps) market and third-party apps are a big part of mobile investigations and mobile hacking as they collect and save lots of data from the end-user. If the analyst finds an app source code, it is highly likely that that app was run through an obfuscation tool so that third parties do not steal the code. Studying mobile malware also will run into obfuscation in the source code and using the correct decoder is most likely the first step to understand how the code works and reverse-engineering the malware.
Conclusion:
In conclusion, encoding, obfuscation, and encryption are all three different concepts but are often confused by organizations on how to be implemented. Encryption is the only method that adds security to data as it is only reversible via a key. While encoding and obfuscation use similar methods to transform data into a different form. The differences are related to the purpose of the transformation – encoding is for compressing files and is done so that data can be easily reversed back, whereas obfuscation is to purposely obscure data but at the end of the day can be changed back with some effort. Overall, all three can be used for good and evil. Hackers have utilized ingenious ways to hide from security detection systems and craft malware that uses powerful encryption to steal information by attack the availability of that data. Mobile devices, especially smart phones are similar to traditional desktop computers in function and therefore all malware and security concerns that apply to these traditional systems also do mobile devices. However, mobile forensics deals with numerous challenges as they seek to understand and find important encrypted files, or hidden data through obfuscation and must be aware of the tools available to succeed in a complex environment.
References:
Callahan, K. (2013). The Impact of the Allied Cryptographers on World War II: Cryptanalysis of the Japanese and German Cipher Machines. Retrieved from https://pdfs.semanticscholar.org/c7cf/0c41932d61457dd943dc4dffca2c8bb92e95.pdf
Coolman, R. (2016). What Is Algebra? Retrieved from https://www.livescience.com/50258-algebra.html
Cremen, L. (2018). What devs need to know about Encoding / Encryption / Hashing / Salting / Stretching. Retrieved from https://hackernoon.com/what-devs-need-to-know-about-encoding-encryption-hashing-salting-stretching-76a3da32e0fd
Curran, B. (2018). What is RSA Cryptography? Complete Guide to this Encryption Algorithm. Retrieved 25 August 2019, from https://blockonomi.com/rsa-cryptography/
Egan, D. (2018). Radix 64 Encoding: Implementation in C | Dev Notes. Retrieved 24 August 2019, from https://dev-notes.eu/2018/08/radix-64-encoding-with-example-implementation-in-c/
Fiscus, K. (2011). Base64 Can Get You Pwned. SANS Institute. InfoSec Reading Room.
Retrieved from: https://www.sans.org/reading-room/?utm_source=web&utm_medium=text-ad&utm_content=generic_rr_pdf_logo1&utm_campaign=Reading_Room&ref=36909
Franklin, R. (2019). AES vs. RSA Encryption: What Are the Differences? - Syncsort Blog. Retrieved 25 August 2019, from https://blog.syncsort.com/2019/03/data-security/aes-vs-rsa-encryption-differences/
Gillogly, J. (1995). Ciphertext-Only Cryptanalsysis of Enigma. Cryptologia, XIX(4).
Lord, N. (2019). What Is Data Encryption? Definition, Best Practices & More. Retrieved from https://digitalguardian.com/blog/what-data-encryption
Mahalik, H., Tamma, R., & Bommisetty, S. (2016). Practical Mobile Forensics (2nd ed.) Birmingham, UK: Packt Publishing.
Miessler, D. (2019). Encoding vs. Encryption vs. Hashing vs. Obfuscation | Daniel Miessler. Retrieved 24 August 2019, from https://danielmiessler.com/study/encoding-encryption-hashing-obfuscation/
Rankin, B. (2018). Polymorphic Malware — Real Life Transformers. Retrieved 25 August 2019, from https://www.lastline.com/blog/polymorphic-malware-real-life-transformers/
Riecken, T. (2019). ASCII Encoding: Beginners, Newbies.... We've Got All Of The Info You Need Here. Retrieved 24 August 2019, from https://www.whoishostingthis.com/resources/ascii/
Rouse, M. (2019). What is Asymmetric Cryptography? - Definition from WhatIs.com. Retrieved 25 August 2019, from https://searchsecurity.techtarget.com/definition/asymmetric-cryptography
Rouse, M. (2014). What is Data Encryption Standard (DES)? - Definition from WhatIs.com. Retrieved 25 August 2019, from https://searchsecurity.techtarget.com/definition/Data-Encryption-Standard
Rouse, M. (2018). What is cryptography? - Definition from WhatIs.com. Retrieved 23 August 2019, from https://searchsecurity.techtarget.com/definition/cryptography
Simmons, G. (2017). Data Encryption Standard | cryptology. Retrieved 25 August 2019, from https://www.britannica.com/topic/Data-Encryption-Standard
Singh, R. (2018). What is URL Encoding and How does it work?. Retrieved 24 August 2019, from https://www.urlencoder.io/learn/
Smirnoff, P., & Turner, D. (2019). Symmetric Key Encryption - why, where and how it’s used in banking. Retrieved 25 August 2019, from https://www.cryptomathic.com/news-events/blog/symmetric-key-encryption-why-where-and-how-its-used-in-banking
Stevenson, J. (2018). Combating Polymorphic Malware. Retrieved from https://www.deep-secure.com/blog/45-combating-polymorphic-malware.php
EC-Council University. (2019). Most Common Malware Attacks – Ransomware (Part - 3) - EC-Council University. Retrieved 25 August 2019, from https://www.eccu.edu/most-common-malware-attacks-ransomware-part-3/
Emotet Malware – An Introduction to the Banking Trojan. (2019). Retrieved 25 August 2019, from https://www.malwarebytes.com/emotet/
History of Cryptography | Binance Academy. (2019). Retrieved from https://www.binance.vision/security/history-of-cryptography
RIJNDAEL. (2000). Retrieved 25 August 2019, from https://www.cs.mcgill.ca/~kaleigh/computers/crypto_rijndael.html