Michal Zalecki
Michal Zalecki
software development, testing, JavaScript,
TypeScript, Node.js, React, and other stuff

Caching headers: A practical guide for frontend developers

There are multiple headers available for developers and ops people to manipulate cache behavior. The old spec is mixing with the new, there are numerous settings to configure, and you can find multiple users reporting inconsistent behavior.

In this post, I focus on explaining how different headers influence the browser cache and how they relate to proxy servers. You're going to find an example of a configuration for Nginx and code for Node.js running Express. In the end, I look into how popular services created in React are serving their web applications.

For a single page application, I'm interested in caching JavaScript, CSS, fonts and image files indefinitely and prevent caching HTML files and a Service Worker if you have any. This strategy is viable as my assets files have unique identifiers in the file names. You can achieve the same configuring webpack to include a [hash] or even better a [chunkhash] in the file name of your assets. This technique is called long-term caching.

But when you prevent re-downloading, how you then make updates to your website? Maintaining the ability to update the website is why it's so important never to cache HTML files. Every time you visit my site, the browser fetches a fresh copy of the HTML file from the server, and only when there're new script srcs or link hrefs browser is downloading a new asset from the server.

Cache-Control

Cache-Control: no-store

The browser should not store anything about the request and when it's told to no-store. You can use it for HTML and Service Worker script.

Cache-Control: public, no-cache

or

Cache-Control: public, max-age=0, must-revalidate

These two are equivalent and, despite the no-cache name, allow for serving cached responses with the exception that the browser has to validate if the cache is fresh. If you correctly set ETag or Last-Modified headers so that the browser can verify that it already has the recent version cached, you and your users are going to save on bandwidth. You can use it for HTML and Service Worker script.

Cache-Control: private, no-cache

or

Cache-Control: private, max-age=0, must-revalidate

By analogy, these two are also equivalent. The difference between public and private is that a shared cache (e.g., CDN) can cache public responses but not private responses. The local cache (e.g., browser) can still cache private responses. You use private when you render your HTML on the server and rendered HTML contains user-specific or sensitive information. In framework terms, you don't need to set private for typical Gatsby blog, but you should consider it with Next.js for pages that require authorized access.

Cache-Control: public, max-age=31536000, immutable

In this example, the browser is going to cache the response for a year according to the max-age directive (606024*365). The immutable directive tells the browser that the content of this response (file) is not going to change, and the browser should not validate its cache by sending If-None-Match (ETag validation) or If-Modified-Since (Last-Modified validation). Use is for your static assets to support long-term caching strategies.

Pragma and Expires

Pragma: no-cache
Expires: <http-date>

Pragma is an old header defined in the HTTP/1.0 spec as a request header. Later the HTTP/1.1 spec states that Pragma: no-cache response should be handled as Cache-Control: no-cache, but it's not a reliable replacement due to it's still a request header. I also keep using Pragma: no-cache as OWASP security recommendation. Including Pragma: no-cache header is a precaution due to legacy servers that don't support newer cache control mechanisms and could cache what you don't intend to be cached.

Some would argue that unless you have to support Internet Explorer 5 or Netscape, you don't need either Prama or Expires. It comes down to supporting legacy software. Proxies universally understand Expires header, which gives is a slight edge. For HTML files, I keep Expires header disabled or set it to a past date. For static assets, I manage it together with Cashe-Control's max-age via the Nginx expires directive.

ETags

ETag: W/"5e15153d-120f"

or

ETag: "5e15153d-120f"

ETags are one of the several methods of cache validation. ETag must uniquely identify the resource, and most often, the web server generates fingerprint from the resource content. When the resource changes, it's going to have a different ETag value. There're two types of ETags. Weak ETags equality indicates that resources are semantically equivalent. Strong ETags validation indicates that resources are byte-to-byte identical. You can distinguish between them by the "W/" prefix set for weak ETags. Weak ETags are not suitable for byte-range requests but are easy to generate on the fly. In practice, you are not going to set ETags on your own and let your webserver to handle them.

curl -I <http-address>
curl -I -H "Accept-Encoding: gzip" <http-address>

You may see that when you request a static file from Nginx, it's going to set a strong ETag. When gzip compression is enabled, but you didn't upload compressed files, the on the fly compression results in using weak ETags.

By sending the "If-None-Match" request header with the ETag of a cached resource, the browser expects either a 200 OK response with a new resource or an empty 304 Not Modified response, which indicates that cached resource should be used instead of downloading a new one.

The less utilized but not less important for frontend developers is a fact that the same optimization can apply to API GET responses and is not limited to static files. If your application receives large JSON payloads, you can configure your backend to calculate and set ETag from the content of the payload (e.g., using md5) and before sending it to the client, compare with the "If-None-Match" request header. If there's a match, instead of sending the payload, send 304 Not Modified to save on bandwidth and improve web app performance.

Last-Modified

Last-Modified: Tue, 07 Jan 2020 23:33:17 GMT

The Last-Modified response header is another cache control mechanism and uses the last modification date. The Last-Modified header is a fallback mechanism for a more accurate ETags.

By sending the "If-Modified-Since" request header with the last modification date of a cached resource, the browser expects either a 200 OK response with a newer resource or an empty 304 Not Modified response, which indicates that cached resource should be used instead of downloading a new one.

Debugging

When you set headers and then test the configuration, make sure you're close to your server with regards to the network. What I mean by that is if you have your server Dockerized, then run the container and test it locally. If you configure a VM, then ssh to that VM and test headers there. If you have a Kubernetes cluster, spin up a pod and call your service from within the cluster. In a production setup, you're going to work with load balancers, proxies, CDNs. At each of those steps, your headers can get modified, so it's much easier to debug knowing your server sent correct headers in the first place.

An example of such unexpected behavior can be a Cloudflare removing the ETag header if you have Email Address Obfuscation or Automatic HTTPS Rewrites enabled. Good luck trying to debug it by changing your server configuration! In Cloudflare defense, this behavior is very well documented and makes perfect sense, so it's on you to know your tools.

Cache-Control: max-age=31536000
Cache-Control: public, immutable

Earlier in this post, I've put "or" in-between of headers in code snippers to indicate that those are two different examples. Sometimes you may notice more than one same header in the HTTP response. It means that both headers apply. Some proxy servers can merge headers along the way. The above example is equal to:

Cache-Control: max-age=31536000, public, immutable

Using curl is going to give you the most consistent results and ease of running in multiple environments. If you decide to use a web browser regardless, make sure to look at the Service Worker while debugging caching problems. Service Worker debugging is a complex topic for another post. To troubleshoot caching problems, make sure you enable bypassing service workers in the DevTools Application tab.

Nginx Configuration

Now when you understand what different types of caching headers do, it's time to focus on putting your knowledge into practice. This following Nginx configuration is going to serve Single Page Application that was build to support long-term caching.

gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript;

First of all, I enabled gzip compression for content types that benefit a Single Page Application the most. For more details on each of the available gzip settings, head to nginx gzip module documentation.

location ~* (\.html|\/sw\.js)$ {
  expires -1y;
  add_header Pragma "no-cache";
  add_header Cache-Control "public";
}

I want to match all HTML files together with /sw.js, which is a Service Worker script. Neither should be cached. Nginx expires directive set to negative value sets past Expires header and adds additional Cache-Control: no-cache header.

location ~* \.(js|css|png|jpg|jpeg|gif|ico|json)$ {
  expires 1y;
  add_header Cache-Control "public, immutable";
}

I want to maximize the caching of all my static assets, which are JavaScript files, CSS files, images, and static JSON files. If you host your font files, you can add them as well.

location / {
  try_files $uri $uri/ =404;
}


if ($host ~* ^www\.(.*)) {
  set $host_without_www $1;
  rewrite ^(.*) https://$host_without_www$1 permanent;
}

Those two are not related to caching but an essential part of the Nginx configuration. Since modern Single Page Applications support routing for pretty URLs, and my static server is not aware of them, I would like to serve a default index.html for every route that doesn't match a static file. I'm also interested in redirects from URLs with www. to URLs without www. You might not need the last one in case you host your application, where your service provider already does that for you.

Express Configuration

Sometimes we are unable to serve static files using a reverse proxy server like Nginx. It might be the case that your serverless setup/service provider limits you to using one of the popular programming languages, and performance is not your primary concern. In such a case, you might want to use a server like Express to serve your static files.

import express, { Response } from "express";
import compression from "compression";
import path from "path";

const PORT = process.env.PORT || 3000;
const BUILD_PATH = "public";

const app = express();

function setNoCache(res: Response) {
  const date = new Date();
  date.setFullYear(date.getFullYear() - 1);
  res.setHeader("Expires", date.toUTCString());
  res.setHeader("Pragma", "no-cache");
  res.setHeader("Cache-Control", "public, no-cache");
}

function setLongTermCache(res: Response) {
  const date = new Date();
  date.setFullYear(date.getFullYear() + 1);
  res.setHeader("Expires", date.toUTCString());
  res.setHeader("Cache-Control", "public, max-age=31536000, immutable");
}

app.use(compression());
app.use(
  express.static(BUILD_PATH, {
    extensions: ["html"],
    setHeaders(res, path) {
      if (path.match(/(\.html|\/sw\.js)$/)) {
        setNoCache(res);
        return;
      }

      if (path.match(/\.(js|css|png|jpg|jpeg|gif|ico|json)$/)) {
        setLongTermCache(res);
      }
    },
  }),
);

app.get("*", (req, res) => {
  setNoCache(res);
  res.sendFile(path.resolve(BUILD_PATH, "index.html"));
});

app.listen(PORT, () => {
  console.log(`Server is running http://localhost:${PORT}`);
});

This script is mimicking what our Nginx configuration is doing. Enable gzip using the compression middleware. Express Static middleware sets ETag and Last-Modified headers for you. We have to handle sending index.html on our own in case the request doesn't match any knows static file.

Examples

Finally, I wanted to explore how popular services utilize caching headers. I check headers separately for HTML and CSS or JavaScript files. I also looked at the Server header (if any) as it might give us an exciting insight into the underlying infrastructure.

Twitter

Twitter tries very hard for their HTML files not to end up in your browser cache. It looks like Twitter is using Express to serve us <div id="react-root"> entry point for the React app. For whatever reason, Twitter uses Expiry header, and Expires header is missing. I've looked it up but didn't find anything interesting. Might it be a typo? If you know, please leave a comment.

cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
expiry: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Wed, 08 Jan 2020 22:16:19 GMT (current date)
pragma: no-cache
server: tsa_o
x-powered-by: Express

Twitter doesn't have CSS files and is probably using some CSS-in-JS solution. It looks like a containerized application running on Amazon ECS is serving static files.

etag: "fXSAIt9bnXh6KGXnV0ABwQ=="
expires: Thu, 07 Jan 2021 22:19:54 GMT
last-modified: Sat, 07 Dec 2019 22:27:21 GMT
server: ECS (via/F339)

Instagram

Instagram doesn't want your browser to cache HTML either and uses valid Expires header set to the beginning of the year 2000; any prior date than current is as good as any.

last-modified: Wed, 08 Jan 2020 21:45:45 GMT
cache-control: private, no-cache, no-store, must-revalidate
pragma: no-cache
expires: Sat, 01 Jan 2000 00:00:00 GMT

Both CSS and JavaScript files served by Instagram support long term caching and also have an ETag.

etag: "3d0c27ff077a"
cache-control: public,max-age=31536000,immutable

New York Times

The New York Times is also using React and serves its articles as server-side rendered pages. The last modification date seems to be a real date that doesn't change with every request.

cache-control: no-cache
last-modified: Wed, 08 Jan 2020 21:54:09 GMT
server: nginx

New Your Times assets are also cached for a long time with both Etag and Last-Modified date provided.

cache-control: public,max-age=31536000
etag: "42db6c8821fec0e2b3837b2ea2ece8fe"
expires: Wed, 24 Jun 2020 23:27:22 GMT
last-modified: Tue, 25 Jun 2019 22:51:52 GMT
server: UploadServer

Wrap up

I've created it partially to organize my knowledge, but also I intend to use it as a cheat sheet for configuring current and future projects. I hope you enjoyed reading and found it useful!

If you have any questions or would like to suggest an improvement, please leave a comment below, and I'll be happy to answer it!

This article has been originally posted on LogRocket's blog: Caching headers: A practical guide for frontend developers

Photo by JOSHUA COLEMAN on Unsplash.