Skip to main content

Hosting a Static Website on GCS with Cloudflare

My project Chunked has essentially three components:

  1. The API is an executable written in Haskell containerised and deployed on Cloud Run so I don’t have to worry about infrastructure. It’s also using Cloud SQL with PostgreSQL.

  2. Projections are also written in Haskell but deployed on GKE Autopilot since they have to run continuously. If you don’t know anything about CQRS / Event Sourcing, it doesn’t matter for the rest.

  3. The front-end application uses Vue.js. It is a single-page application so it is completely static. It is deployed on Google Cloud Storage (GCS) and served with Cloudflare for TLS.

Resources are created with Terraform and anything that has to do with building the application, running tests or deployment is managed with Bazel. However, the following configuration should be useful even if you don’t use these tools.


The bucket

There are only two conditions to create the bucket:

  1. Its name must match the domain it will be accessible at.

  2. It must be public.

With Terraform, you can create such a bucket with the following resource. This assumes the Google provider has been configured already.

resource "google_storage_bucket" "website" {
  name     = "yourwebsite.tld"
  location = "EUROPE-WEST1"

  versioning {
    enabled = false
  }

  website {
    main_page_suffix = "index.html"
    not_found_page   = "404.html"
  }
}

Next, we need to make it publicly available.

resource "google_storage_bucket_iam_member" "website" {
  bucket = google_storage_bucket.website.name
  role   = "roles/storage.objectViewer"
  member = "allUsers"
}

Unfortunately, pointing your domain name to GCS directly is not enough since it doesn’t support TLS. This is where Cloudflare comes in. But first, we need to fix caching.


Caching on GCS

By default, GCS will cache static resources for an hour. I found it too long for my use case, especially since Cloudflare already takes care of caching. Disabling caching entirely would prevent Cloudflare from caching anything so this is not a solution either.

Instead, I figured making it cache for 5 minutes was a reasonable middle ground. It allows you to purge the cache on Cloudflare side and gets new contents served quite quickly.

In order to do this, we need to set the HTTP header Cache-Control to public, max-age=300 on the files in the bucket so GCS will set it accordingly when serving any of the files. This can be done with the following command.

gsutil -m setmeta -h "Cache-Control: public, max-age=300" "gs://yourwebsite.tld/**"

For the record, I use the following Bazel rules to do just this. You can skip to the next section if you don’t care about Bazel.

WORKSPACE
# ...

http_archive(
    name = "rules_nixpkgs",
    sha256 = "7aee35c95251c1751e765f7da09c3bb096d41e6d6dca3c72544781a5573be4aa",
    strip_prefix = "rules_nixpkgs-0.8.0",
    urls = ["https://github.com/tweag/rules_nixpkgs/archive/v0.8.0.tar.gz"],
)

load("@rules_nixpkgs//nixpkgs:repositories.bzl", "rules_nixpkgs_dependencies")

rules_nixpkgs_dependencies()

nixpkgs_local_repository(
    name = "nixpkgs",
    nix_file = "//:bazel-nixpkgs.nix",
    # ^ This might be the topic of another blog.
    nix_file_deps = [
        # ...
    ],
)

nixpkgs_package(
    name = "gcloud",
    attribute_path = "google-cloud-sdk",
    build_file = "//:BUILD.gcloud.bazel",
    repository = "@nixpkgs",
)
BUILD.gcloud.bazel
package(default_visibility = ["//visibility:public"])

# ...

alias(
    name = "gsutil",
    actual = "bin/gsutil",
)
bazel/gcloud.bzl
def _cloud_storage_sync_impl(ctx):
    output_file = ctx.actions.declare_file(ctx.label.name)

    script_path = ctx.attr.script[DefaultInfo].files.to_list()[0].short_path
    gsutil_path = ctx.attr.gsutil[DefaultInfo].files.to_list()[0].short_path

    ctx.actions.write(
        output = output_file,
        is_executable = True,
        content = " ".join([
            script_path,
            gsutil_path,
            ctx.attr.bucket,
            ctx.label.package,
        ] + [
            target.short_path
            for group in ctx.attr.files
            for target in group[DefaultInfo].files.to_list()
        ]),
    )

    runfiles = ctx.runfiles(files = ctx.files.files + ctx.files.script + ctx.files.gsutil)

    return [DefaultInfo(executable = output_file, runfiles = runfiles)]

cloud_storage_sync = rule(
    _cloud_storage_sync_impl,
    attrs = dict(
        bucket = attr.string(),
        files = attr.label_list(
            allow_files = True,
        ),
        script = attr.label(
            default = Label("//bazel:cloud_storage_sync.sh"),
            allow_files = True,
        ),
        gsutil = attr.label(
            default = Label("@gcloud//:gsutil"),
            allow_files = True,
        ),
    ),
    executable = True,
)
bazel/cloud_storage_sync.sh
#!/usr/bin/env bash

set -euo pipefail

prepare_directory() {
    local dir="$1"; shift
    local prefix="$1"; shift

    for path in "$@"; do
        stripped_path="$(sed -e "s@^${prefix}/@@" <<<"${path}")"
        subdir="$(dirname "${stripped_path}")"
        mkdir -p "${dir}/${subdir}"
        cp --dereference "${path}" "${dir}/${stripped_path}"
    done
}

# Only cache for 5 minutes on Google's edge servers. CloudFlare takes care of
# most of the caching, but we still need to be able to cache the files for a
# little while, otherwise CloudFlare is forced to hit Google on every request.
fix_cache_control() {
    local gsutil="$1"; shift
    local bucket="$1"; shift

    "${gsutil}" -m setmeta \
        -h "Cache-Control: public, max-age=300" \
        "gs://${bucket}/**"
}

upload() {
    local gsutil="$1"; shift
    local bucket="$1"; shift
    local prefix="$1"; shift

    dir="$(mktemp -d)"
    prepare_directory "${dir}" "${prefix}" "$@"

    "${gsutil}" -m rsync -d -r "${dir}" "gs://${bucket}/"

    fix_cache_control "${gsutil}" "${bucket}"

    rm -rf "${dir}"
}

upload "$@"

Finally, this can be used like this.

frontend/BUILD.bazel
load("//bazel:gcloud.bzl", "cloud_storage_sync")

# ...

webpack(
    name = "frontend",
    outs = [
        "index.html",
        # ...
    ],
    # ...
)

cloud_storage_sync(
    name = "deploy",
    bucket = "yourwebsite.tld",
    files = [":frontend"],
)

Deployment is done by running bazel run //frontend:deploy.


Cloudflare

DNS zone

In order for Clouflare to be able to serve anything, traffic must actually go to their servers. This is why we need to make it manage the DNS zone, in this case yourwebsite.tld.

Clouflare allows you to create "normal" DNS records such as www of type A and value 1.2.3.4 so that, when someone interrogates DNS servers for the address of www.yourwebsite.tld, they would get 1.2.3.4. They also allows you to proxy traffic through their servers. In this case, a client would get an IP address of one of Cloudflare’s edge servers near their location and packets will be proxied to 1.2.3.4.

In addition to proxying packets, Cloudflare also takes care of TLS, i.e. encryption of the traffic between the client and their servers. This is what we need since GCS won’t do it for us and we want the static website to be accessible with HTTPS.

With Terraform, we can create the DNS zone as follows.

backend.tf
terraform {
  required_providers {
    cloudflare = {
    source  = "cloudflare/cloudflare"
    version = "2.23.0"
    }
  }
}


provider "cloudflare" {
  api_token = something # I use data.google_kms_secret for this.
}

dns.tf

# The DNS zone itself. DNS records are added to the zone with cloudflare_record.
resource "cloudflare_zone" "yourwebsite_tld" {
  zone = "yourwebsite.tld"
}

# youwebsite.tld is proxied to GCS.
resource "cloudflare_record" "yourwebsite_tld_cname" {
  zone_id = cloudflare_zone.yourwebsite_tld.id
  name    = "@"
  value   = "c.storage.googleapis.com"
  type    = "CNAME"
  ttl     = 1
  proxied = true
}

# www.yourwebsite.tld is equivalent to yourwebsite.tld.
resource "cloudflare_record" "www_yourwebsite_tld_cname" {
  zone_id = cloudflare_zone.yourwebsite_tld.id
  name    = "www"
  type    = "CNAME"
  value   = "yourwebsite.tld"
  ttl     = 1
  proxied = true
}

Page rules

We are almost done. Visiting https://yourwebsite.tld/ serves the static website but there are two more things to take care of.

  1. Requests to www.yourwebsite.tld should be redirected to yourwebsite.tld so that the website is accessible at a unique URL. Some people prefer the other way around. I personally think that www. is redundant and kind of ugly to be honest.

  2. Requests to http:// should be redirected to their https:// equivalent.

Cloudlfare has page rules that can be used to define what must be done on some matching requests. This can be set in the Rules tab on Cloudflare. Terraform configuration is given here.

For the first rule, we need to match www.yourwebsite.tld/* and use Forwarding URL with 301 - Permanent Redirect to redirect to https://www.yourwebsite.tld/$1. Notice that it also takes care of HTTPS so we don’t have two redirections in case somebody navigates to http://www.yourwebsite.tld.

For the second, we match yourwebsite.tld/* and use the setting Always use HTTPS.

frontend.tf
resource "cloudflare_page_rule" "no_www" {
  zone_id = cloudflare_zone.your_website.id
  target  = "www.yourwebsite.tld/*"
  priority = 2

  actions {
    forwarding_url {
      url = "https://yourwebsite.tld/$1"
      status_code = 301
    }
  }
}

resource "cloudflare_page_rule" "http_to_https" {
  zone_id = cloudflare_zone.your_website.id
  target  = "yourwebsite.tld/*"
  priority = 1

  actions {
    always_use_https = true
  }
}

This time, we’re done. The website is accessible with HTTPS at https://yourwebsite.tld, is cached and served by Cloudflare on edge servers near the users' locations and redirections are handled properly.

Moreover, purging the cache on Cloudflare (Quick Actions > Purge Cache) actually works as expected.