S3 & CloudFront Hosting

Hosting this website on Amazon's S3 & CloudFront services

by Hadley Bradley

 Migrating to Amazon Web Services

It’s January 2020 and I’m in the process of migrating all my web content to Amazon Web Services (AWS). For several years I’ve hosted my site on a small VPS with Digital Ocean. I’ve been happy with the service that Digital Ocean offer. However, I’ve been less happy with the chore of patching and upgrading the software on the server to make sure it wasn’t hacked and taken over.

At the beginning of the month I got an email saying that the client software I used to renew the SSL certificates with Let’s Encrypt would be discontinued in June 2020. Sure, I can have updated the client to the latest version, but I decided to take this opportunity to move the site to AWS and save myself from having to perform monthly maintenance tasks.

My website is made up completely from static web pages, no server side interaction is needed. So I’ve kept the set-up very simple. I upload the web site files to an S3 bucket. A cloud front distribution then mirrors those files to multiple edge locations around the world. Thanks to some Route53 mapping magic the domain name hadleybradley.com is then served via cloud front with a free SSL certificate signed by Amazon.

The cloud front distribution forwards its access logs to a separate bucket as compressed files. I periodically pull these access logs down to my machine and process them to get a sense of what content is being viewed the most.

Below is the current architecture diagram:

S3 and CloudFront Hosting Diagram

 Using the AWS CLI to push new files to the S3 bucket

I use the AWS command line interface to automate the task of updating and backing up the web site.

Below is the command I use to synchronise the files from my Linux machine up to the S3 bucket. As you can see from the command, I explicitly exclude some files and folders from being uploaded. I also set the access control list so that the files uploaded are public readable.

aws s3 sync . s3://hadleybradley.com
    --acl 'public-read'
    --exclude '.DS*'
    --exclude 'makefile'
    --exclude 'templates/*'
    --exclude 'content/*'

 Invalidating the caches

In most cases I’ll simply upload the new files and let CloudFront automatically clear its caches and fetch a copy of the new pages. However, sometimes you want to invalidate the caches straight away so that the new version of a page becomes available to the public.

Below is the command I use to invalidate the CloudFront distribution by passing in the distribution ID and the files I want to invalidate. You can be specific and invalidate just one file, such as your style sheet. Or like in this example, invalidate all your files.

aws cloudfront create-invalidation 
    --distribution-id EWEQJ46P45QW
    --paths "/*"

 Pulling and Processing the CloudFront logs

To process the logs I use a bash script which automates the following steps.

Download new log files to my local laptop:

aws s3 sync s3://logs.hadleybradley.com/ .

Decompress the files and join them together into a single access.log file.

gunzip -k --stdout EW*.gz > access.log

Strip out all the junk and noise entries in the log file. You know, like all the automated attacks to determine if your site has a WordPress login page. As I don’t have any PHP files on my site I strip out all these requests like:

sed -i"+" '/.php/d' access.log

Finally the script processes the log file using GoAccess a free open source web log analyzer. The processed output is written to a report web page which is then automatically opened in my default web browser for review.

goaccess access.log 
    --color-scheme=1 > report.html
open report.html

 Backing up the web site files

Finally I have a backup command which syncs all my files to a separate S3 bucket. This command uploads all my templates and document sources, written in markdown. It also backs up copies my page generation logic/code which is written in Go.

aws s3 sync . s3://backup.hadleybradley.com

This backups bucket also has file versioning switched on so that I can retain each version of a given file as it’s backed up.

Table of Contents

  1. Migrating to Amazon Web Services
  2. Using the AWS CLI to push new files to the S3 bucket
  3. Invalidating the caches
  4. Pulling and Processing the CloudFront logs
  5. Backing up the web site files