Source code is often distributed in an insecure manner. Java and .NET programs retain practically all the information of the original source code. This makes them much easier to reverse engineer than traditional applications which are distributed as native code. It is essential to protect an application against reverse engineering. In this article we will look at code obfuscation
Source code is often distributed in an insecure manner. Java programs, delivered as byte code, and .NET programs, delivered in MSIL (Microsoft Intermediate Language), retain practically all the information of the original source code. This makes them much easier to reverse engineer than traditional applications which are distributed as native code. Reverse engineering can be used by malicious users to tamper the software and bypass licensing restrictions, or by competitors to extract proprietary algorithms and data structures. It is essential to protect an application against reverse engineering. In this article we will look at code obfuscation.
What is code obfuscation
The word obfuscation literally translates to ‘making something less clear and harder to understand’.
In computing, obfuscation is used to transform the code into a form that is functionally identical to the original code but is much more difficult to understand and reverse engineer using tools. We are not assuming here that obfuscation will make the code impossible to reverse engineer. The aim is to increase the cost of reverse engineering the code, so that it becomes infeasible. There should be a significant difference between the time needed to obfuscate and the time needed to deobfuscate.
Code obfuscation methods
Obfuscation methods are classified depending on the information they target. Some simple transformations target the lexical structure of the program while others target the data structures or the control flow. Obfuscation methods are further classified based on the kind of operation they perform on the targeted information. Some methods manipulate the aggregation of control or data, while others affect the ordering.
The different obfuscation methods are:
- Layout obfuscation: Targets the layout of the application, such as source code formatting, variable names and comments.
- Data obfuscation: Targets the data structures used by the program.
- Storage obfuscation: Alters how data is stored in memory. An example is converting local variables to global.
- Encoding obfuscation: Alters how stored data is interpreted. An example is replacing a variable i by a derived value c1*i +c2
- Aggregation obfuscation: Alters how data is grouped together. An example is splitting an array into several sub-arrays.
- Ordering obfuscation: Alters how data is ordered. An example is reordering the elements in an array, storing the ith element in a new position determined by a function f(i).
- Control obfuscation: Targets the control flow of the program.
- Aggregation obfuscation: Alters how statements are grouped together. An example is inlining, which means replacing a function call by the body of the function.
- Ordering obfuscation: Alters the order in which statements are executed. An example is reversing a loop so that it iterates backwards.
- Computation obfuscation: Alters the control flow in a program, for example, by inserting object level code that has no source code equivalent, or by inserting new redundant code or code that will never be executed (dead code).
- Preventive transformation: The main goal of this method is not to obscure the code but to make it more difficult to break for the deobfuscators.
- Targeted: Tries to make automatic deobfuscation techniques more difficult.
- Inherent: Tries to exploit known weaknesses in deobfuscators.
Parameters for evaluating quality of an obfuscation method
To study obfuscation methods in detail we should be able to evaluate the quality of the transformation. The quality of an obfuscation method is determined by the combination of its potency, resilience, stealth and cost.
- Potency: Potency defines to what degree the transformed code is more obscure than the original. Software complexity metrics define various complexity measures for software, such as number of predicates it contains, depth of its inheritance tree, nesting levels, etc. While the goal of good software design is to minimize complexity based on these parameters, the goal of obfuscation is to maximize it.
- Resilience: Resilience defines how well the transformed code can resist automated deobfuscation attacks. It is a combination of the programmer effort to create a deobfuscator and the time and space required by the deobfuscator. The highest degree of resilience is a one-way transformation that cannot be undone by a deobfuscator. An example is when the obfuscation removes information such as source code formatting.
The difference between potency and resilience is that a transformation is potent if it can confuse a human reader, whereas it is resilient if a deobfuscator tool cannot undo the transformation.
For example, let us consider the trivial transformation:
if (1==2) S1;
This transformation is potent since it increases the complexity, but it has no resilience since it can easily be undone by a deobfuscator.
- Stealth: Stealth defines how well the obfuscated code blends with the rest of the program. If the transformation introduces code that stands out from the rest of the program, it may be difficult for a deobfuscator to spot, but it can easily be spotted by a reverse engineer. Stealth is context-sensitive; what is stealthy in one program may not be in another.
Cost: Cost is the execution time and space overhead in the obfuscated code compared to the original code. A transformation with no cost associated is free. Cost is also context-sensitive. For example a statement i=10 inserted at the outermost nesting level will have much lower cost than when it is inserted inside an inner loop.
Having looked at the parameters to evaluate a transformation let us now define and explore one method in detail.
Layout obfuscation refers to altering the formatting of the source file. This involves removing source code comments, removing debug information and changing the names of elements such as the class, member variables, and the local variable.
Source code comment removal and formatting removal are free transformations, since there is no increase in space and time from the original application. The potency is low because there is very little semantic content in formatting. It is a one-way transformation because the formatting, once removed, cannot be recovered. Scrambling of variable names is also a one-way and free transformation, but it has much higher potency than formatting removal. Crema, one of the oldest Java obfuscators, uses layout obfuscation.
Having had an overview of obfuscation we will delve deeper into this field over the next few issues of Palisade. We’ll explore in detail the other methods of code obfuscation.
Other Articles in this series