Back to Basics: Http Essentials

balaji
By balaji

December 27, 2006

In this article series, we will refresh through some of the basic concepts in HTTP. The first part of the series provides answers to a few questions on caching. It primarily addresses questions like what is stored in a cache, how is it stored and how to control their behaviour.

In this article series, we will refresh through some of the basic concepts in HTTP. The first part of the series provides answers to a few questions on caching. It primarily addresses questions like what is stored in a cache, how is it stored and how to control their behaviour.

What are caches?

Caches are temporary storage locations for web pages. This storage can be on the client's desktop, on an intermediate HTTP proxy or on a dedicated caching engine attached to the origin web server. Caches help the client browser to retrieve pages faster. In some cases there is no need to contact the web server at all while in some other cases the web server has to be contacted to check if the copy in cache is as "fresh" as on the server. This results in faster access and reduced network load. The cache on client desktop is called "Private" cache and cache on a intermediate Proxy server is called non-private or "Shared" cache.

Who determines what is stored in cache?

The web server which serves the page determines whether it can be cached or not. The HTTP response from the web server contains cache related parameters in the header which determines whether a page can be cached and if yes where [in private and shared caches or only in private] and for how long [expiry or age].

Can web server specify a finite life-time for cached contents?

Yes there are different methods to control the life-time of cached content. The content stored in the cache can be set to have a defined expiry time or can be set to be validated with the web server before it is served.

By using the Expires header value in the HTTP response header the web-server can ensure that this content will become stale at this time. eg.Expires: Fri, 30 Oct 1998 14:19:41 GMT . Instead of using an absolute time in Expires - we can also mention a relative time by using max-age. Max age=3600 means this content will be stored for the next 3600 seconds or one hour.

By setting the "must-revalidate" in the HTTP response header the server can ensure that the cache [private or shared] will always revalidate before serving the contents irrespective of the Expiry and max-age.

How does the browser/proxy revalidate cached content?

If the cached content is set to be re-validated, either because it has expired or because the "must-revalidate" parameter has been set, the browser sends a request to the server with the cache validator details. If the content in browser's cache is "fresh" server replies with a Response code 304 [Not modified] without the actual content and the browser will load the contents directly from its cache. If it is not "fresh" the web server will send the actual content.

What are the cache validators?

Once the browser or proxy decides to revalidate the cached contents they use the cache validators.This is a parameter which the server can use to check the freshness of cached content.

The web server can set a "Last-Modified" value in the response header which can be used as a validator. When there is a request for the same page, the browser or proxy will issue a request to the web server for the content with "Last-modified" value copied in the "If-Modified-Since" request header field. The server will check if the content has changed after that date and either give a 304 or the full updated content.

Instead of using a date value like "Last-Modified" server can set a "ETag" in the response header. When there is a request for the same page, the browser or proxy will issue a request to the web server for the content with "Etag" value copied in the "If-None-Match" request header. The server will check if the "Etag" value for that content has changed and either gives a 304 or the updated content.

Are there differences in cache control parameters/behaviour based on HTTP version?

HTTP 1.1 introduced the Etag as a cache validator in addition to Last Modified in HTTP 1.0. HTTP 1.1 also enables use of max-age for setting a relative expiry instead of the absolute expiry using Expires in HTTP 1.0

Can the server specify if a portion of the page can be cached; for example HTML can be cached but not the cookies?

In HTTP 1.1 there is provision for the server to mention that some portions of the response can/cannot be cached. For example to disable caching of cookies only in this response - cache-control: no-cache="set-cookie".

Can the client browser request for bypassing local/proxy cache and fetching data directly from server?

In the HTTP request if there is a Cache-control: no cache or Cache-control:max-age=0 or Pragma:No cache then the local and intermediate proxy cache content will not be used to serve the response. Content will be taken from web server.

What should I specify to ensure that my web pages are not cached anywhere?

Cache-control: no store is the best option.

Are browser History and Cache related?

History contains the list of URLs which you visited for a pre-defined time [Interet explorer by default keeps for last 20 days]. It does not contain any text or images of the URLs visited. Cache contains the URL, text and images. The cache cannot be searched manually [unless you use specific tools], but the browser searches it when you issue a new request.

Can pages accessed through HTTPS be cached?

Pages accessed using HTTPS cannot be seen by the intermediate proxies since they are encrypted. So they cannot anyway cache them. At the local browser these HTTPS get decrypted and can be cached depending on the cache control directives mentioned in the HTTP response header.

Where can I read the RFC related to HTTP Caching?

http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13

When was this article last updated?

December 5, 2006


Tags: Features

About

balaji

SUBSCRIBE TO OUR BLOG

Buyers-Guide-Collateral

WHITEPAPER

Buyer’s Guide to Managed Detection and Response

Download
MDR

Get AI Powered

Managed Detection and Response

MDR-learmore-btn

 

MDR-Guide-Collateral

REPORT

AI-Driven Managed Detection and Response

Download Report
Episode

EPISODE-25

Red-LineAsset-6

Why Your ‘Likes’ on Facebook May Be Revealing Far More than You Thought

Click URL in the Post for the Full Podacst
  • FacebookAsset
  • LinkedinAsset
  • TwitterAsset