Applications that use XML databases are vulnerable to injection attacks. Read on to find out how XPATH queries are manipulated to access sensitive information
In a typical Web Application architecture, all data is stored on a Database server. This server can be storing data in various formats like an RDBMS database, LDAP or XML. Based on the user input, the application queries the server and accesses the information. Attackers manage to extract more information than allowed by manipulating the query with specially crafted inputs. Here, we'll be discussing XPATH Injection techniques to extract data from XML databases. Before we go deeper into understanding XPATH injection lets take a brief look at what XML is and how an XPath query is formed.
XML and XPATH
XML stands for Extensible Markup Language and was designed to describe data. It allows programmers to create their own customized tags to store data . An XML document is similar to an RDBMS Database except for the way data is stored in them. In case of a DB, data is stored in a table in rows and columns whereas in XML the data is stored in nodes in a tree form. XML Path or XPath language is used for querying information from the nodes of an XML document. Path expressions are used to access elements and attributes in an XML document, which return a node-set, a string, a Boolean or a number. XPath contains a library of 100 built-in functions like Boolean values, date and time comparison, string values etc.
Lets us take an example of an XML document called users.xml and see how an XPath function can be used to retrieve information:
<?xml version="1.0" encoding="ISO-8859-1"?>
<LoginID> abc </LoginID>
<cardno> 568100123412 </cardno>
<accountno> 11123 </accountno>
<passwd> test123 </passwd>
<cardno> 506800212114 </cardno>
<LoginID> xyz </LoginID>
<accountno> 56723 </accountno>
<passwd> testing234 </passwd>
The function selectNodes takes as parameter the path-expression which will extract the value in the cardno node under the savings node from the users.xml document. The path expression for the cardno in this case is
The result of the above query will be--
When an application has to retrieve some information from the XML based on user input, it fires an XPath query which gets executed at the server.
XPATH Injection Attack Techniques
These are some of the different ways XPATH injection attacks can be carried out.
Since RDBMS databases are used extensively, SQL injection has been a favorite of attackers for a long time. Let's say we have an application which connects to a DB having a users table. So for authenticating the user, the SQL query that will be used is
Select * from users where LoginID=' ' and passwd=' '
In this query the user has to give the LoginID and the passwd as input. If an attacker enters the following in the LoginID field
abc' or 1=1 --
the query formed will be
Select * from users where LoginID = 'abc' or 1=1 -- 'and passwd=' '
Since -- is used for commenting out a line in SQL, the
' and passwd=' ' part of the query is not executed and the query returns true. So, the attacker gains entry into the application. XPATH injection attacks also work on similar lines. However there is no equivalent of -- in XPATH to comment out parts of the query. Lets see how this is achieved. The XPath query used for authentication can be
String(//users[LoginID/text()=' " + txtLoginID.Text + " ' and passwd/text()=' "+ txtPasswd.Text +" '])
If the input is
abc' or 1=1 or 'a'='b, the query will be
String(//users[LoginID/text()='abc' or 1=1 or 'a'='b' and passwd/text()=''])
LoginID='abc' or 1=1 or 'a'='b' and passwd/text()=' '
can be represented as
A OR B OR C AND D
In case of logical operators AND has higher precedence than OR, so the above expression can also be written as (A OR B) OR (C AND D). So if either A or B are true the expression will evaluate to true irrespective of what (C AND D) returns. In the user input for our query B is 1=1, making (A OR B) always true. Hence our query returns true and the user is able to login.
Extracting the XML document structure
The query used to bypass authentication can be used to extract information about the XML document also. Suppose an attacker makes a guess that the name of the first sub-node in the XML document is LoginID and wants to confirm it. The attacker enters the following input
abc' or name(//users/LoginID) = 'LoginID' or 'a'='b
In place of 1=1 in the previous example, the expression given here checks if the first subnode's name is LoginID. The query formed is
String(//users[LoginID/text()='abc' or name(//users/LoginID) = 'LoginID' or 'a=b' and passwd/text()=''])
If the attacker is authenticated it means that the guess was correct and the first sub-node is LoginID. The attacker can then use this to extract more information. This provides the user with the information about the structure of the XML document.
XPATH Injection can be prevented in the same way as SQL injection. Some of the preventive measures are -
Input Validation: is one of the best measures to defend applications from XPATH injection attacks. The developer has to ensure that the application does not take any malicious input. It is very difficult to decide what can constitute as malicious input. However, there are some best practices that a developer can follow. They are as follows-
- Assume all input to be malicious until proven otherwise.
- Accept only known good input.
- Have a centralized approach towards input validation in the application design.
- Have both client and server side validation.
Parametrized Queries: Another method to prevent XPATH injection is by forming Parametrized queries. We have seen that XPath queries are formed as expressions and these get executed dynamically at run time. In Parametrized queries, the queries are precompiled and instead of passing user input as expressions, parameters are passed.
If we pass parameters to the following query -
"//users[LoginID/text()=' " + txtLoginID.Text + " ' and passwd/text()=' "+ txtPasswd.Text +" ']"
the query will look like this-
"//users[LoginID/text()= $LoginID and passwd/text()= $password]"
The input is not used to form the query, instead, the query looks for the value in the XML document and fails. This prevents injection attacks.