S3 & CloudFront Hosting

Hosting this website on Amazon's S3 & CloudFront services

by Hadley Bradley

 Migrating to Amazon Web Services

At the start of 2020 I began the process of migrating all my web content to Amazon Web Services (AWS). For several years I’ve hosted my site on a small VPS with Digital Ocean. I’ve been happy with the service that Digital Ocean offer. However, I’ve been less happy with the chore of patching and upgrading the software on the server to make sure it wasn’t hacked and taken over.

At the beginning of the month I got an email saying that the client software I used to renew the SSL certificates with Let’s Encrypt would be discontinued in June 2020. Sure, I could have updated the client to the latest version, but I decided to take this opportunity to move the site to AWS and save myself from having to perform monthly maintenance tasks. It was time to go serverless and save some money.

This website is made up completely from static web pages, no server side interaction is needed. So I’ve kept the set-up very simple. I upload the web site files to an S3 bucket. A cloud front distribution then mirrors those files to multiple edge locations around the world. Thanks to some Route53 mapping magic the domain name hadleybradley.com is then served via cloud front with a free SSL certificate signed by Amazon’s Certificate Manger.

The cloud front distribution forwards its access logs to a separate bucket as compressed files. I periodically pull these access logs down to my machine and process them to get a sense of what content is being viewed the most.

Below is the current architecture diagram:

S3 and CloudFront Hosting Diagram

 Using the AWS CLI to push new files to the S3 bucket

I use the AWS command line interface to automate the task of updating and backing up the web site.

Below is the command I use to synchronise the files from my Linux machine up to the S3 bucket. As you can see from the command, I explicitly exclude some files and folders from being uploaded. I also set the access control list so that the files uploaded are public readable.

aws s3 sync . s3://hadleybradley.com
    --acl     'public-read'
    --exclude '.DS*'
    --exclude 'makefile'
    --exclude 'templates/*'
    --exclude 'content/*'

 Invalidating CloudFront Caches

In most cases I’ll simply upload the new files and let CloudFront automatically clear its caches and fetch a copy of the new pages. However, sometimes you’ll want to invalidate the caches straight away so that the new version of a page becomes available to the public. CloudFront allows you to invalidate a thousand paths per month for free. After which you’ll pay a small amount to invalidate paths. A thousand invalidations is more than plenty for an average popular website.

Below is the command I use to invalidate the CloudFront distribution by passing in the distribution ID and the files I want to invalidate. You can be specific and invalidate just one file, such as your style sheet. Or like in this example, invalidate all your files.

aws cloudfront create-invalidation 
    --distribution-id EWEQJ46P45QW
    --paths "/*"

 CloudFront Configuration

Under the CloudFront distribution I have a default behavior which is used for all assets. Within this section I’ve switched on [Compress Objects Automatically] which allows CloudFront to automatically compress content when the viewer requests it (indicated in the Accept-Encoding header in the viewer request). This is great as my users get compressed copies of the HTML, CSS and JavaScript, which makes the site load quicker.

Under the [Error Pages] tab I’ve also configured a custom error page for a 404 file not found. This makes the distribution behave like a traditional web server.

 Lambda Edge Function For Custom Headers

By default, a web site hosted using S3 and CloudFront would score poorly on a security audit site like Security Headers. This is because, by default, non of the typical HTTP security HTTP headers are sent with the response from the CloudFront edge locations. I fixed this by deploying a Lambda@Edge function which injects the necessary headers with every response.

While you have to pay for every invocation of a Lambda function, this type of function only gets invoked when CloudFront encounters a cache miss and needs to go back to the source to retrieve the asset requested. The results are then cached at the edge location for subsequent requests.

'use strict';
exports.handler = (event, context, callback) => {

    // get contents of response
    const response = event.Records[0].cf.response;
    const headers = response.headers;

    // get request object and see if the
    // requested file is a static asset and 
    // then set an appropriate caching header
    const requestUri = event.Records[0].cf.request.uri;
    if (requestUri.endsWith('.js') ||
        requestUri.endsWith('.jpg') ||
        requestUri.endsWith('.png') ||
        requestUri.endsWith('.svg') ||
        requestUri.endsWith('.ico')) {
        response.headers['cache-control'] = key: 'Cache-Control', value: 'public, max-age=63072000, immutable';
    }

    // set secure headers 
    headers['strict-transport-security'] = key: 'Strict-Transport-Security', value: 'max-age=63072000; includeSubdomains; preload';
    headers['content-security-policy'] = key: 'Content-Security-Policy', value: "default-src 'self'; font-src 'self' https://fonts.gstatic.com; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; script-src https: 'unsafe-inline'";
    headers['x-content-type-options'] = key: 'X-Content-Type-Options', value: 'nosniff';
    headers['x-frame-options'] = key: 'X-Frame-Options', value: 'DENY';
    headers['x-xss-protection'] = key: 'X-XSS-Protection', value: '1; mode=block';
    headers['referrer-policy'] = key: 'Referrer-Policy', value: 'same-origin';

    // return modified response
    callback(null, response);
};

This function also conditionally injects a long cache control header for static assets like image, javascript files and the sites favicon.

 IAM Policy and Trust Relationship

This type of Lambda@Edge function isn’t run within the context of your account. It’s run on demand at the CloudFront edge locations. As such, the IAM role that this function runs with, needs a [trust relationship] setting up so that the function can assume the role from either lambda.amazonaws.com or edgelambda.amazonaws.com

Below is a JSON policy document that can be used to define this trust relationship within the IAM role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "edgelambda.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

 Pulling and Processing the CloudFront logs

To process the logs I use a bash script which automates the following steps.

Download new log files to my local laptop:

aws s3 sync s3://logs.hadleybradley.com/ .

As the logs fill up, you might just want to download a specific month:

aws s3 sync s3://logs.hadleybradley.com/ .
    --exclude "*"
    --include "EWEQJ46P45QW.2020-12*" 

Decompress the files and join them together into a single access.log file.

gunzip -k --stdout EW*.gz > access.log

Strip out all the junk and noise entries in the log file. You know, like all the automated attacks to determine if your site has a WordPress login page. As I don’t have any PHP files on my site I strip out all these requests like:

sed -i"+" '/.php/d' access.log

Finally the script processes the log file using GoAccess a free open source web log analyzer. The processed output is written to a report web page which is then automatically opened in my default web browser for review.

goaccess access.log 
    --log-format=CLOUDFRONT
    --color-scheme=1 > report.html
open report.html

 Backing up the web site files

Finally I have a backup command which syncs all my files to a separate S3 bucket. This command uploads all my templates and document source files written in markdown. It also backs up copies my page generation logic/code which is written in Go.

aws s3 sync . s3://backup.hadleybradley.com

This backup bucket also has S3 file versioning switched on so that I can retain each version of a given file as it’s backed up.

Need Help or Advice

If you need any help or advice in setting up and configuring a secure, high performance static web site using Amazon’s S3 and CloudFront products, then please get in touch I’d be happy to help.


Table of Contents

  1. Migrating to Amazon Web Services
  2. Using the AWS CLI to push new files to the S3 bucket
  3. Invalidating CloudFront Caches
  4. CloudFront Configuration
  5. Lambda Edge Function For Custom Headers
  6. IAM Policy and Trust Relationship
  7. Pulling and Processing the CloudFront logs
  8. Backing up the web site files