Last week, Roshen wrote how we do code reviews. I want to go a step forward and discuss the role of automated code scanning in these code reviews.
When we encounter an application with over 200,000 lines of code, we know that we will not have time to read every line of code. Nor is it necessary to read every line of code. As our code review process showed, we start with the Threat Profile and then figure out which sections of code to review. That helps us focus our efforts on the areas of code that an attacker would exercise.
That smaller area is still fairly large, so every bit of automation helps.
Before we dive into the role of automated code scanning, let’s look at how most modern code review scanners work. [Brian Chess’ and Jacob West’s "Secure Programming with Static Analysis" is a great book on the subject.] There’re two basic strategies code scanners take:
In one very powerful approach, code scanners trace the path of an input from the source to its destination through all the transformations it takes. For example, the input could be the amount of funds to be transferred in a transaction. The input starts off from the user’s browser, is validated by pieces of code and then is used to construct a SQL statement which is executed against the database. The code scanner analyzes the path taken from source to destination (often called the sink) and then predicts if a malicious input would get through. For example, could the user send a malicious SQL snippet and execute that on the server for SQL Injection? This approach of tracing the input from source to the sink is useful to find code that is vulnerable to SQL Injection and Cross Site Scripting attacks.
A second approach is to look for patterns of insecurity in the code - calls to insecure functions, error conditions not handled, null pointers, etc. This approach is simpler, but it generates more false alarms than the first approach, and requires further manual analysis to confirm the finding.
Notice that in all these approaches, the code review scanner is unaware of the context of the application. It does not know, nor care, whether the application is an online banking site or an e-Commerce site, or an online game. The scanner applies its algorithms independently of the context of the applications.
The blissful ignorance of context is both a strength and a weakness.
The scanners are able to apply certain general principles that apply to a vulnerability - an input from the user being reflected back to the browser without escaping the < and > symbols, for instance - without trying to understand the context. That’s a good thing. The disadvantage is that entire classes of vulnerabilities that require understanding the context are outside the purview of the scanner - for instance, siphoning off funds in a banking application, or flouting the rules of chess in an online gaming site.
We thus use the scanner for finding some standard vulnerabilities - dynamic SQL queries, reflected inputs, unhandled exceptions etc. These are the basis for common attacks like SQL Injection, Cross Site Scripting, and others. Once the scanner identifies code snippets as candidates for these vulnerabilities, we analyze them manually to confirm the flaw. For attacks that exercise the business logic - usually different variable manipulation attacks - we analyze the code manually.
What about finding backdoors with code scanners? Since scanners are unaware of context, they usually don’t recognize backdoors that have been either purposely or inadvertently inserted in the code by the developers. Looking for backdoors manually is an interesting challenge. We will cover that in a different post.