Code Obfuscation Part 3 - Hiding Control Flows

Paladion
By Paladion

October 15, 2005

In the last two issues we introduced code obfuscation and went in-depth into data structure obfuscation. In this issue we look at control obfuscation, a class of obfuscation techniques that targets the control flow in a program

Code Obfuscation

In the last two issues we introduced code obfuscation and went in-depth into data structure obfuscation. In this issue we look at control obfuscation, a class of obfuscation techniques that targets the control flow in a program. These transformations break up logically related computations or merge logically unrelated ones, randomize the order of computations, insert new or irrelevant computations. Before we look at different methods of control obfuscation, let us understand the concept of a predicate - a term that will be used a lot in these pages. A predicate is a function or condition that evaluates to true or false. Predicates are derived from the conditional statements in a program, as well as from implicit run-time safety checks. PT is a predicate that will always evaluate to true, PF is a predicate that will always evaluate to false and P? is a predicate that will sometimes evaluate to true and sometimes to false. A predicate is opaque if its value is known to the obfuscator but is difficult to determine for a deobfuscator. Hence opaque predicates are useful for obfuscation.

Insert Dead or Irrelevant Code

The complexity of code depends on the number of predicates it contains. We can insert a predicate PT that splits a block into half. The predicate is irrelevant code since it always evaluates to true, but it increases the complexity. (In the diagram below, S1, S2 etc are statements that we obfuscate by manipulating control flows.)

Figure 1

Similarly, we can split a block into two parts, and two different versions of one part can be created which perform the same function but that fact is not obvious to the reverse engineer. A predicate P? can be used to select between the two.

Figure 2

Another variation is to insert a buggy block of code but ensure it is not called by using a predicate PT.

Figure 3

Add Redundant Operands

Figure 4

Redundant operands can be added to arithmetic expressions to make them more complex. This technique, however, affects the performance of the program since we are introducing extra processing because of the new operations.

Remove common library calls and Programming Idioms

Standard library functions are well known and so calls to them make code easy to reverse engineer. To counter this, obfuscated versions of the standard libraries can be used. In this case there is an increase in cost in terms of program size. Similarly, common programming patterns or idioms that occur frequently in a particular language make code easier to reverse engineer. For example, Java does not have a standard class that provides linked list. So, most programmers in Java implement linked lists using a next field. Iteration trough such a list is a common pattern that can be easily identified. A solution to this is to replace such common patterns with lesser known ones.

Reducible to Non-reducible code

Decompilers work by converting the machine code back into source code. If a transformation is language-breaking if it introduces code segments in machine language that have no source language equivalent, e.g. the goto statement, which is a valid instruction in Java bytecode but doesn’t have a counterpart in Java language.

Parallelize Code

Increasing parallelism in the code makes it more difficult to revere engineer as it hides the actual flow of control. We could create dummy processes that perform no useful task and run them in parallel with the actual instructions.

Figure 5

We could also split a sequential section of the application into multiple sections executing in parallel. A code block that does not have any data dependencies can easily be parallelized.

Figure 6

A code block that has data dependencies can be parallelized by using synchronization functions.

Figure 7

Parallelizing can have a considerable time overhead associated for making sure the parallel threads are properly synchronized so that the program functionality stays unchanged. However, the resilience and potency of this method is very high as the introduction of parallel paths makes analysis difficult both for a deobfuscator program as well as a reverse engineer.

Inline and Outline Methods

Inlining is a process in which method calls are replaced by the actual body of the method. It is used traditionally as a code optimization technique since it removes the overhead of the call. Outlining is the reverse process of turning a set of statements into a method. These can be used as obfuscation techniques. Inlining is highly resilient since if a function call is replaced with the function body and the function is removed, the original structure cannot be traced back.
Outlining can be paired with inlining to increase the confusion. For example, suppose we have two functions P and Q called one after the other. They can be inlined at the calling location and removed from the code. Following this, the end of P’s code and the beginning of Q’s code can be outlined into a bogus function R.

Interleave Methods

If multiple methods are woven together into one, it can make the program difficult to understand or reverse engineer. The entire code appears to be connected, whereas there was actually no correlation.

Figure 8

Clone Methods

We can understand a function better by looking at the different places in which it is called. The cloning technique tries to hide this information by creating multiple versions (clones) of the function that look different but have the same functionality and calling the different versions at different places, making it appear as if different functions are being called.

Extend Loop Condition

Confusion can be added by making loop conditions more complex. An extra predicate PT or PF is used, that does not affect the number of times the loop executes.
In this example the predicate we have added always evaluates to True since x2(x+1)2 mod 4 = 0.

Figure 9

Reverse the Loop counter

The ordering of a loop can be changed for example by making it go backward. This is a simple low-potency transformation since it does not add too much confusion.

Loop Transformations

Loop transformations were originally designed for performance optimization, but some of them are useful obfuscation techniques as well since they change the structure of a program. Let us look at a couple of examples.
Loop unrolling replicates the body of a loop multiple times and hence reduces the number of times it is called.

Figure 10

Loop fission converts a compound loop into multiple loops.

Figure 11

Reorder Statements and Expressions

Reordering statements and expressions is a low potency technique, though its resilience is high since it is difficult to put back the statements in the original order. The effectiveness as an obfuscation technique can be increased by combining it with other methods like inlining-outlining, or by reordering the expressions at a bytecode level so that it breaks the link between the bytecode and the source.

Table Interpretation

This is one of the most effective as well as expensive transformations. A section of the bytecode is converted into a different virtual machine code. This new code is then executed by a virtual machine interpreter included with the obfuscated application. A particular application can contain several interpreters for executing different sections of the obfuscated application. Due to the high cost in terms of processing speed, this transformation should be reserved for sections of code that make up a small part of the total runtime or which need a very high level of protection.

Summary

We have looked at different control obfuscation techniques in this article. Let’s recap how effective each of these techniques are in terms of Potency, Resilience and Cost.

Transform

Potency

Resilience

Cost

Insert Dead or Irrelevant Code

Varies

Varies

Varies

Reducible to Non-reducible code

Varies

Varies

Varies

Add Redundant Operands

Varies

Varies

Varies

Remove common library calls and Programming Idioms

Medium

Strong

Varies

Parallelize Code

High

Strong

Costly

Inline and Outline Methods

Medium

One-way

Free

Interleave Methods

Varies

Varies

Varies

Clone Methods

Varies

Varies

Varies

Extend Loop Condition

Varies

Varies

Varies

Reverse Loop Counter

Low

One-way

Free

Loop Transformations

Low

Weak

Cheap

Reorder Statements and Expressions

Low

One-way

Free

Table Interpretation

High

Strong

Costly


Tags: Technical

About

Paladion

SUBSCRIBE TO OUR BLOG

Buyers-Guide-Collateral

WHITEPAPER

Buyer’s Guide to Managed Detection and Response

Download
MDR

Get AI Powered

Managed Detection and Response

MDR-learmore-btn

 

MDR-Guide-Collateral

REPORT

AI-Driven Managed Detection and Response

Download Report
Episode

EPISODE-25

Red-LineAsset-6

Why Your ‘Likes’ on Facebook May Be Revealing Far More than You Thought

Click URL in the Post for the Full Podacst
  • FacebookAsset
  • LinkedinAsset
  • TwitterAsset