After many years of my main site and my blog (you’re here right now) being hosted on infra I don’t own, I have migrated everything to infra I do own.

My main site was hosted on GitHub pages, and my dev blog on a Google Cloud Micro instance (the free tier one). I didn’t intentionally scatter them about, it kind of just happened over the years. I had a little extra time on my hands and was tired of the messy configuration, so I decided to unify them.

This was done for a couple of reasons:

Convenience

Everything is in one place now. Previously it was spread across a couple of places.

  • My main site repo on GitHub
  • My main site hosted with GitHub pages
  • My blog site repo on GitLab
  • My blog site hosted with a free Google Micro instance with a custom deployer

Already this is a little annoying to manage and remember where everything is.

Stability

Google loves to tout its free f1-Micro instance. On their page describing their free cloud offerings they say you can:

Solve real business challenges on Google Cloud

On my micro instance I ran stock CentOS 8 with 30 GiB of disk space and just a few custom programs:

  • Nginx for hosting a few static sites
  • My own custom simple site updater that got a GitLab hook and updated 1 (one) site
  • A cron for updating all of my OpenSSL certificates
  • yum-cron for installing security updates automatically
  • ufw for blocking ports
  • fail2ban for blocking bad actors

At best, all of these sites together probably got maybe 1000 pageviews in a month, if I wrote a particularly spicy post and posted it to reddit. In short, a low traffic set of sites with a pretty typical setup for hosting a simple site.

Now let’s answer the question, does the free tier Solve real business challenges for my mom and pop site?

TL/DR: No

Starting a month or two into using the Google Micro instance, the instance would freeze, causing all traffic to the site to hang. Even ssh access to the site stop working making maintenance impossible from the CLI. It wasn’t the end of the world, I’d reboot the server from the cloud console and it would work fine again.

Over time though, this became more common and would happen on a bi-weekly basis. I had all of the recommended settings on, even Automatic restart which I figured would handle something out of my control like this.

The freezing got worse though, and started happening more often. My poor micro instance would freeze every few days and sometimes even daily. I setup a cron to run every few hours and reboot the instance hoping the good old, turn it off and on again would fix it. Still nothing. Sometimes it felt like the freezing got worse by adding the periodic reboot.

I double checked to make sure I wasn’t compromised and everything looked good. So my conclusion: the Google free micro instance just freezes up for no good reason over time. It was up barely a year before it became unusable.

Finally: My data is no longer in the hands of Google or Microsoft

The final indirect QoL fix for my site, no longer are my sites in the hands of both Google and Microsoft.

When Microsoft bought GitHub, I pledged to move all of my new repos to GitLab. Despite all of the positive PR Microsoft has bought, it is still a company that sells data for a living. I did not want to contribute to their massive data collection further. My main site was still hosted on GitHub pages though.

Likewise, if Microsoft is bad for privacy, Google is much worse. While I’m sure their cloud instances have a special privacy policy for user data, it’s very hard for me to trust them. After reading through much of The Age of Surveillance Capitalism I pledged to move away from my reliance on Google and their services.

Finally after self hosting all of my data, I am free from both for keeping my sites up and running.

So how does it work now?

Everything is nicely self hosted at my house. I will spare the hardware setup and most of the software setup since like on the cloud instance, it is quite boring. The more exciting part is the blog updater that was changed a few times to be just right. When a new blogpost is made, if I have to manually deploy and set up the changes each time, it’s a big failure.

I wanted to set it up so:

  • It would be as easy as a git push to update the site
  • I would like to scale to as many sites as I’d like to make
  • I’d like it to be able to handle pure static sites in addition to ones that need a little processing

With that in mind, I decided to update my old site updater to meet the new specs.

How it used to work

My old site updater (used for keeping my blog up to date) worked like so:

  • Pass in some CLI args saying where the single hugo site is, and update it when we get a GitLab webhook

That’s really it. The principle is good, but scaling this to a huge number of sites is difficult. Should I update the CLI to take multiple sites at once? Should I add CLI option to support simple sites and one for hugo sites? Should I have to restart the service every time I add a new site to pass in a new set of CLI args?

These were all questions I didn’t properly ask myself when I started working on an update to v1 of my site updater, I made it so:

  • An instance of the site updater could only update one site (same as before)
  • A new CLI option for handling a simple static site that didn’t need any hugo processing

This worked, but wasn’t great. I thought about it a bit what this would entail if I added more sites. It meant every site was going to need a corresponding site updater to run alongside it. Very annoying to set it up for each new site and quite a waste of resources.

How it works now

After wasting my time once, I didn’t want to waste it a second time. I had goals in mind:

  • One instance of a site updater
  • The updater shouldn’t require restarts to add new sites
  • It should work seamlessly with sites that just need an update, and sites that need hugo (or any other postprocessing steps)

With that in mind, I decided to hit my requirements by storing data in the path params the webhook would hit, and everything else in a config file stored alongside the site.

So when you configure your GitLab webhook to hit your updater, you fill out the path:

my-webhook.example.com/site-webhook/opt-my_site

Afterwards, when you hit the site updater, it will check in the directory: /opt/my_site

Looking inside the directory, it will expect to find a config.toml where it will follow the instructions provided.

In the case your site is a simple site (just a git repo and nothing else), it will look for a config like so:

[simple_site]
site_path = "/opt/main_site/site_git_dir"

and pull the latest changes.

In the case you have a hugo site, it will look for a config like so:

[hugo_site]
unprocessed_site_path = "/opt/dev_blog/raw"
processed_site_path = "/opt/dev_blog/processed"

In this case, it will do a git pull in unprocessed_site_path, run hugo on the location, and copy the /public folder to processed_site_path. (It also does a few other things in accordance with hugo best practices, but those aren’t too important).

With this setup, you can point your Nginx or Apache instance at the raw folders you want to host and they will magically be updated to the latest version without any work.

The benefits

This setup is particularly good in that:

  • I am not locked into a particular git provider (anything that sends a webhook will work with little work with the site updater)
  • I am not locked into a particular “Actions” provider (e.g. GitLab CI, GitHub Actions)
  • I am not locked into a particular hosting provider (Google Cloud)

This ends up being faster and more reliable for my use case than the magically provided setup and is open source too!

The end

This is currently what is running in production right now, and if you’re interested in running this yourself, you can check it out here. Currently it’s only set up to work with GitLab, but it’s trivial to extend it to any other provider that uses webhooks.

I hope someone else finds this as useful as I do!