Hosting a Static Website on GCS with Cloudflare
My project Chunked has essentially three components:
-
The API is an executable written in Haskell containerised and deployed on Cloud Run so I don’t have to worry about infrastructure. It’s also using Cloud SQL with PostgreSQL.
-
Projections are also written in Haskell but deployed on GKE Autopilot since they have to run continuously. If you don’t know anything about CQRS / Event Sourcing, it doesn’t matter for the rest.
-
The front-end application uses Vue.js. It is a single-page application so it is completely static. It is deployed on Google Cloud Storage (GCS) and served with Cloudflare for TLS.
Resources are created with Terraform and anything that has to do with building the application, running tests or deployment is managed with Bazel. However, the following configuration should be useful even if you don’t use these tools.
The bucket
There are only two conditions to create the bucket:
-
Its name must match the domain it will be accessible at.
-
It must be public.
With Terraform, you can create such a bucket with the following resource. This assumes the Google provider has been configured already.
resource "google_storage_bucket" "website" {
name = "yourwebsite.tld"
location = "EUROPE-WEST1"
versioning {
enabled = false
}
website {
main_page_suffix = "index.html"
not_found_page = "404.html"
}
}
Next, we need to make it publicly available.
resource "google_storage_bucket_iam_member" "website" {
bucket = google_storage_bucket.website.name
role = "roles/storage.objectViewer"
member = "allUsers"
}
Unfortunately, pointing your domain name to GCS directly is not enough since it doesn’t support TLS. This is where Cloudflare comes in. But first, we need to fix caching.
Caching on GCS
By default, GCS will cache static resources for an hour. I found it too long for my use case, especially since Cloudflare already takes care of caching. Disabling caching entirely would prevent Cloudflare from caching anything so this is not a solution either.
Instead, I figured making it cache for 5 minutes was a reasonable middle ground. It allows you to purge the cache on Cloudflare side and gets new contents served quite quickly.
In order to do this, we need to set the HTTP header Cache-Control
to
public, max-age=300
on the files in the bucket so GCS will set it accordingly
when serving any of the files. This can be done with the following command.
gsutil -m setmeta -h "Cache-Control: public, max-age=300" "gs://yourwebsite.tld/**"
For the record, I use the following Bazel rules to do just this. You can skip to the next section if you don’t care about Bazel.
WORKSPACE
# ...
http_archive(
name = "rules_nixpkgs",
sha256 = "7aee35c95251c1751e765f7da09c3bb096d41e6d6dca3c72544781a5573be4aa",
strip_prefix = "rules_nixpkgs-0.8.0",
urls = ["https://github.com/tweag/rules_nixpkgs/archive/v0.8.0.tar.gz"],
)
load("@rules_nixpkgs//nixpkgs:repositories.bzl", "rules_nixpkgs_dependencies")
rules_nixpkgs_dependencies()
nixpkgs_local_repository(
name = "nixpkgs",
nix_file = "//:bazel-nixpkgs.nix",
# ^ This might be the topic of another blog.
nix_file_deps = [
# ...
],
)
nixpkgs_package(
name = "gcloud",
attribute_path = "google-cloud-sdk",
build_file = "//:BUILD.gcloud.bazel",
repository = "@nixpkgs",
)
BUILD.gcloud.bazel
package(default_visibility = ["//visibility:public"])
# ...
alias(
name = "gsutil",
actual = "bin/gsutil",
)
bazel/gcloud.bzl
def _cloud_storage_sync_impl(ctx):
output_file = ctx.actions.declare_file(ctx.label.name)
script_path = ctx.attr.script[DefaultInfo].files.to_list()[0].short_path
gsutil_path = ctx.attr.gsutil[DefaultInfo].files.to_list()[0].short_path
ctx.actions.write(
output = output_file,
is_executable = True,
content = " ".join([
script_path,
gsutil_path,
ctx.attr.bucket,
ctx.label.package,
] + [
target.short_path
for group in ctx.attr.files
for target in group[DefaultInfo].files.to_list()
]),
)
runfiles = ctx.runfiles(files = ctx.files.files + ctx.files.script + ctx.files.gsutil)
return [DefaultInfo(executable = output_file, runfiles = runfiles)]
cloud_storage_sync = rule(
_cloud_storage_sync_impl,
attrs = dict(
bucket = attr.string(),
files = attr.label_list(
allow_files = True,
),
script = attr.label(
default = Label("//bazel:cloud_storage_sync.sh"),
allow_files = True,
),
gsutil = attr.label(
default = Label("@gcloud//:gsutil"),
allow_files = True,
),
),
executable = True,
)
bazel/cloud_storage_sync.sh
#!/usr/bin/env bash
set -euo pipefail
prepare_directory() {
local dir="$1"; shift
local prefix="$1"; shift
for path in "$@"; do
stripped_path="$(sed -e "s@^${prefix}/@@" <<<"${path}")"
subdir="$(dirname "${stripped_path}")"
mkdir -p "${dir}/${subdir}"
cp --dereference "${path}" "${dir}/${stripped_path}"
done
}
# Only cache for 5 minutes on Google's edge servers. CloudFlare takes care of
# most of the caching, but we still need to be able to cache the files for a
# little while, otherwise CloudFlare is forced to hit Google on every request.
fix_cache_control() {
local gsutil="$1"; shift
local bucket="$1"; shift
"${gsutil}" -m setmeta \
-h "Cache-Control: public, max-age=300" \
"gs://${bucket}/**"
}
upload() {
local gsutil="$1"; shift
local bucket="$1"; shift
local prefix="$1"; shift
dir="$(mktemp -d)"
prepare_directory "${dir}" "${prefix}" "$@"
"${gsutil}" -m rsync -d -r "${dir}" "gs://${bucket}/"
fix_cache_control "${gsutil}" "${bucket}"
rm -rf "${dir}"
}
upload "$@"
Finally, this can be used like this.
frontend/BUILD.bazel
load("//bazel:gcloud.bzl", "cloud_storage_sync")
# ...
webpack(
name = "frontend",
outs = [
"index.html",
# ...
],
# ...
)
cloud_storage_sync(
name = "deploy",
bucket = "yourwebsite.tld",
files = [":frontend"],
)
Deployment is done by running bazel run //frontend:deploy
.
Cloudflare
DNS zone
In order for Clouflare to be able to serve anything, traffic must actually go to
their servers. This is why we need to make it manage the DNS zone, in this case
yourwebsite.tld
.
Clouflare allows you to create "normal" DNS records such as www
of type A
and value 1.2.3.4
so that, when someone interrogates DNS servers for the
address of www.yourwebsite.tld
, they would get 1.2.3.4
. They also allows you
to proxy traffic through their servers. In this case, a client would get an IP
address of one of Cloudflare’s edge servers near their location and packets will
be proxied to 1.2.3.4
.
In addition to proxying packets, Cloudflare also takes care of TLS, i.e. encryption of the traffic between the client and their servers. This is what we need since GCS won’t do it for us and we want the static website to be accessible with HTTPS.
With Terraform, we can create the DNS zone as follows.
backend.tf
terraform {
required_providers {
cloudflare = {
source = "cloudflare/cloudflare"
version = "2.23.0"
}
}
}
provider "cloudflare" {
api_token = something # I use data.google_kms_secret for this.
}
dns.tf
# The DNS zone itself. DNS records are added to the zone with cloudflare_record.
resource "cloudflare_zone" "yourwebsite_tld" {
zone = "yourwebsite.tld"
}
# youwebsite.tld is proxied to GCS.
resource "cloudflare_record" "yourwebsite_tld_cname" {
zone_id = cloudflare_zone.yourwebsite_tld.id
name = "@"
value = "c.storage.googleapis.com"
type = "CNAME"
ttl = 1
proxied = true
}
# www.yourwebsite.tld is equivalent to yourwebsite.tld.
resource "cloudflare_record" "www_yourwebsite_tld_cname" {
zone_id = cloudflare_zone.yourwebsite_tld.id
name = "www"
type = "CNAME"
value = "yourwebsite.tld"
ttl = 1
proxied = true
}
Page rules
We are almost done. Visiting https://yourwebsite.tld/
serves the static
website but there are two more things to take care of.
-
Requests to
www.yourwebsite.tld
should be redirected toyourwebsite.tld
so that the website is accessible at a unique URL. Some people prefer the other way around. I personally think thatwww.
is redundant and kind of ugly to be honest. -
Requests to
http://
should be redirected to theirhttps://
equivalent.
Cloudlfare has page rules that can be used to define what must be done on some
matching requests. This can be set in the Rules
tab on Cloudflare. Terraform
configuration is given here.
For the first rule, we need to match www.yourwebsite.tld/*
and use
Forwarding URL
with 301 - Permanent Redirect
to redirect to
https://www.yourwebsite.tld/$1
. Notice that it also takes care of HTTPS so we
don’t have two redirections in case somebody navigates to
http://www.yourwebsite.tld
.
For the second, we match yourwebsite.tld/*
and use the setting Always use
HTTPS
.
frontend.tf
resource "cloudflare_page_rule" "no_www" {
zone_id = cloudflare_zone.your_website.id
target = "www.yourwebsite.tld/*"
priority = 2
actions {
forwarding_url {
url = "https://yourwebsite.tld/$1"
status_code = 301
}
}
}
resource "cloudflare_page_rule" "http_to_https" {
zone_id = cloudflare_zone.your_website.id
target = "yourwebsite.tld/*"
priority = 1
actions {
always_use_https = true
}
}
This time, we’re done. The website is accessible with HTTPS at
https://yourwebsite.tld
, is cached and served by Cloudflare on edge servers
near the users' locations and redirections are handled properly.
Moreover, purging the cache on Cloudflare (Quick Actions
> Purge Cache
)
actually works as expected.