Watchful File Upload

By balaji

April 14, 2011

A file upload is a feature of a web application, which throws open the doorways of the entire file system of the server to end users. What more would an attacker want anyway! Applications that store the uploaded files on the server without any validation put their servers at a huge risk of being compromised. Files like harmful executables can cause considerable damage to the servers. However, it also depends on the way the uploaded files are being handled by the applications.


A file upload is a feature of a web application, which throws open the doorways of the entire file system of the server to end users. What more would an attacker want anyway!

Applications that store the uploaded files on the server without any validation put their servers at a huge risk of being compromised. Files like harmful executables can cause considerable damage to the servers. However, it also depends on the way the uploaded files are being handled by the applications. Well, if you just thought about antivirus protection for a moment, then let me inform you that they would not serve as a reliable source of safeguard in this case. That just provides complementary protection, which proves to be effective against specific categories of files that are very less in comparison with the whole list of files that can cause damage to the application. The attacks via a reverse web shell are still very prominent and pose a big challenge to the application developers even today.

Now, let's try and answer the big question – Can this menace be evaded?

Well, we must always aim at having maximum security in place. We have tried to include all the points below that can help in preventing malicious files from being uploaded into the system. This article is not about inherent vulnerabilities in a client application that handles a specific file format. Preventing broad-level attacks like the "GIFAR" upload, however, would require more analysis and tactful mitigation techniques to be implemented.

What we are trying to bring out here is how to best achieve programmatic validation against uploaded files and the factors to be considered for having a reasonably secure File Upload feature in our applications, which are listed below:

  1. It is very important to ensure that only the desired file types are uploaded to the system. This is also known as the whitelist validation approach. The validation logic, however, must compulsorily reside at the server side. Client-side validation logic can be easily bypassed with the help of web proxy editor tools and we would certainly not want that to happen!
    Additionally, for added security, applications can set dummy names and extensions (if it accepts only one file type) to the uploaded files before storing them in the servers.
  2. Uploaded files must not be placed in the web root directory. This would prevent the files from being exposed on the web. However, if the application needs to do so then a proper authentication check must be put in place before rendering access to the files.
  3. It is advisable to perform periodic virus scans of the directory wherein the uploaded files are being stored.

Out of the above-mentioned factors, the first one related to file validation is the key to the whole secure File Upload implementation.

Most of the developers today rely on file extensions as the "sole" means to identify file types. Well, there is nothing wrong with it, but this validation can be nullified easily using web proxy editor tools. Such tools can be used to change the extensions of the uploaded malicious files in the request, so that the server treats them as valid ones. The attackers would thus be successful in storing the malicious files on the servers with different file extensions. After identifying the way in which the application treats such uploaded files, the attackers can then initiate further exploits on the server. For instance, it has been observed that the browsers can correctly identify the "html" file types even if they get downloaded with an extension other than ".html" from the server. This may possibly be advantageous while exploiting cases wherein the uploaded files are accessible from somewhere else in the application.

Similarly, we can discuss many such scenarios and exploits, but let's leave all of them for future discussions. The main intension of bringing this up was to show that the file extension check alone does not make up for it. This leaves us with a need to look out for additional parameters to be considered for validations...

Is the answer "Content-type"?

No. The Content-Type header can also be changed by an attacker in a similar manner, thereby misleading the server.

In this regard, should we not look at file content? Most of the file formats have some unique identification in their make. All the files of a particular type have a common sequence of predefined characters present in their content. Such common patterns of characters are called file signatures or magic numbers. File signatures can certainly be a good candidate for file validation.

So, we can conclude that to get a robust protection, we must implement both file extension check and content validation of the uploaded files.

However, achieving this programmatically would be a big challenge and an overhead for the application developers. Well, in that case, the developers can take advantage of the work that has already gone into this area making available numerous APIs and tools to perform this task. Let's look at a few, which support a comprehensive list of file types known to be used by web applications today.


DROID (Digital Record Object Identification) is an open source tool developed by the National Archives in UK under the umbrella of its PRONOM technical registry service. It is a cross-platform tool written in Java. It works on the internal and external signatures (magic number, extension) of a file to identify its type. It keeps updating its data store of unique signatures corresponding to different file formats from the format registry PRONOM. DROID has two different interfaces to work with, a graphical Java Swing GUI and a command-driven interface.

DROID is available for download here

The following tools also perform file validations based on format signatures:

  • JHOVE –
  • TrilD –
  • File Identifier –

But, DROID has the most comprehensive list of file format signatures and is the most accurate.

APIs for J2EE applications

Apache Tika

J2EE-based applications can also look for Apache Tika as the most comprehensive solution for this problem. This project of Apache takes the concept to a new level by introducing various parsers to work on the uploaded files, in addition to detecting their types. It can facilitate file-type detection and metadata extraction in J2EE applications on a large scale.

How to work with Apache Tika?

To begin with, we will have to download and add the jar file to the web libraries of the application. The jar can be downloaded from here (where 0.8 is the latest release number of this project).

There are two ways of going about it,

Method 1: Using the Tika class

Tika tika = new Tika();
File file = new File(fileName);
Metadata metadata = new Metadata();
FileInputStream in = new FileInputStream(file);
tika.parse(in , metadata);
String type = metadata.get(Metadata.CONTENT_TYPE);
catch (IOException ex)
System.out.println("IO Error");

Method 2: Using the Parser class

String type = null;
Tika tika = new Tika();
File file = new File(filename);
FileInputStream in = new FileInputStream(file);
Metadata metadata = new Metadata();
ContentHandler contenthandler = new BodyContentHandler();
org.apache.tika.parser.Parser parser = new AutoDetectParser();
try {
parser.parse(in, contenthandler, metadata);
type = metadata.get(Metadata.CONTENT_TYPE);
} catch (SAXException ex) {
} catch (TikaException ex) {

In both these cases, it is the "parse" function that does all the work and updates the Metadata instance with information about the parsed file.

What to do after detecting the file type (MIME type)?

Once the exact file type (MIME type) of the uploaded files is retrieved using Tika API, the application must compare it with the list of desired types allowed for upload. In case of a "mismatch", "no match" or an "exception", the uploaded file instance must be DELETED from the system and a custom error page must be rendered to the users.

Is there a simple way of retrieving the "File" instance from the web request?

Generally, in web applications, it has been observed that getting the instance of the uploaded file from the request is a cumbersome task. Even the getInputStream() method of HTTPServletRequest cannot be used directly for parsing or detecting files as it returns the entire body of the request and not just the file stream. To waive off this programming hurdle, there is another API available to us, which is "MultipartRequest". It understands the "multipart form" request and easily gives the instance of the uploaded file on a method call, as shown below.

This code must go in the "doPost" method of the servlets:

MultipartRequest fileRequest
= new MultipartRequest(request, uploadedFilePath);
File uploadedfile = fileRequest.getFile("data");
-- uploadedFilePath: It is the location of the directory wherein uploaded
files must be stored.
-- data: This must correspond to the name of the element used to accept
the file. E.g. <input name="data" type="file">

The applications built on standard frameworks like Struts and Spring can take advantage of the APIs available in them to retrieve file instances.


In addition to Tika, programmers can also look at another API – Jmime. The list of file types that it supports is less in comparison with Tika, however, it works fine for some image files like jpeg.

This completes the discussion on the number of options available to automate the validation process. But before closing the topic, let's look at another useful project from Apache i.e. POI. It is a programming-level interface used to deal with MS Office documents. It is observed that developers now-a-days, are looking at parsing solutions that work with standard file types. A big advantage of such APIs is that they generate an exception if the parsing fails (i.e. on encountering an unsupported file type) and do not process the files further. Thus, they prove to be an effective ingredient in securing file uploads and taking the overhead of validations from developers. But as mentioned before, the uploaded files must be deleted in the event of any parsing error. Otherwise, even if the malicious files remain unprocessed in the application, they will be successful in making an entry into the server and creating trouble.

However, it is always advisable to have desired validations in the application irrespective of the presence of any complementary protection. Developing the feature with the security perspective in mind can go a long way in preventing many web-based attacks...


Tags: Best Practices