In my previous blog posts (found here and here) I gave a brief introduction to containers and as well as giving tips on how to make the containers behave like physical computers on your LAN. I also briefly mentioned that we host our own git server. Why would we do that, when there are plenty of cloud services that offer us hassle-free access to git?
In order to understand why we made this decision, one need to rewind the time a few months. After deciding to found our own studio, one of the first questions you need to answer is: do we spend time and effort in order to setup our own IT environment or should we try and leverage as much as we can from already-running services in the cloud, which are updated regularly with the latest security patches and maintained by someone else?
If I would’ve looked at my options a few years ago, before containerization was a thing, the cloud services would’ve looked more appealing. You essentially had two options, to use ‘the one server to rule all services’ (a.k.a. SPOF) approach (we’re indie devs, so multiple servers comes at a significant cost to us) – or host things in the cloud. If those two were my only choices, then hosting our services in the cloud would probably have been the option I would’ve chosen.
But now, with containerization, everything changes. We can have a single physical machine host a multitude of services (each isolated and independent from one another), the setup of each service is a breeze (or rather, as complicated as you want it to be), the startup speed is counted in seconds and not hours and setup files are easy to maintain and place in version control. The decision is not as clear-cut anymore. Let’s compare the price of a minimal set of services hosted in the cloud compared to hosting your own containers:
- git hosting – free up to five users, then about $20-$25 per month for the first ten users (in the long run about $5-$10 per user).
- Jenkins hosting – CloudBees doesn’t even name a price.
- Phabricator hosting – $20 per user/month.
So, depending on the number of devs in our team we are looking at about $20-$80 per month ($240-$960 per year) give or take. It’s not huge sums of money, mind you, but still something you notice each month when you’ve got no income. If you also would like to integrate these services with others, prices increase because that will probably require some kind of ‘Enterprise’ level subscription. Now, one could also choose to take the middle road here and host these services using Amazon EC2/Microsoft Azure/Google Compute Engine, but each virtual server costs about $15-$25 each per month, so there’s not much money to be saved from doing so.
On the other hand, self-hosting of these services doesn’t cost you anything more some initial hardware (which you can adapt to your budget) and an increase in your electricity bill, but you are on the other hand responsible for everything, which now includes server failures and data loss. Oh no 🙁
So how do we deal with this problem then? Well, by using cloud services of course! No, I’m serious. We do solve it by using cloud services, just not service-specific ones. Instead, we utilize Google Cloud Storage. Google Cloud Storage is very aggressively priced compared to hosted services – you pay $0.01 per GB/month. Compared to our earlier estimated monthly cost of $20-$80, that means we have a break-even point comes when we save around 2TB-8TB of data! In comparison with bitbucket for example, $2 gives you a new user with 5GB of file storage. Using Google Cloud Storage, $2 instead gives you 4000% more space (200GB) and unlimited users to boot!
So cloud storage is cheap, but how do we make sure that everything we do and all service settings are properly backed up to the cloud? First of all, we have a NAS which is configured to use RAID-10. On that NAS, we have a folder that is called “backup”. In this folder, we put everything we want to backup. Art assets, Jenkins configurations, Phabricator configurations, everything. We also download backups from our webpage and put them here. But more importantly, this is where our git repositories reside. Maybe even more important is the fact that backing up a git repository is the same thing as copying the folder. There’s some caveats to this of course, like someone can’t be writing to the git repository during the copy etc., but if we manage to copy the folder without errors the repository is backed up.
In order to make the actual backup to Google Cloud Storage we utilize gsutil, which is a tool developed by Google to communicate with your cloud storage. A nice tool with lots of subcommands, where gsutil rsync is the hero we’re looking for. rsync (in the *nix world) and robocopy (on Windows) are programs written to make an exact copy of one folder between to somewhere else, both supports incremental updates. Thus, gsutil rsync is a tool to do the same thing, but with support for cloud storage solutions like Google Cloud Storage and Amazon S3. The actual command we use to sync our NAS with Google Cloud Storage looks like this:
gsutil -m -o GSUtil:parallel_composite_upload_threshold=100M rsync -P -d -r
/nasdevice/backup gs://my-cool-bucket-storage/backup 2>&1 | tee -a /var/log/cron.log
The options we use are the following:
- -m Process files with multiple threads instead a single one.
- -o GSUtil:parallel_composite_upload_threshold=100M Process files over 100MB with multiple threads.
- rsync Well, we want to use rsync, so this shouldn’t come as a surprise.
- -P Copy POSIX file ACLs (such as who owns each file etc.).
- -d Delete files in the bucket if they aren’t available on our NAS.
- -r Recur through subdirectories.
- /nasdevice/backup Where to copy from.
- gs://my-cool-bucket-storage/backup Where to copy to.
- 2>&1 Redirect standard error to standard out, so we get everything into the same logfile.
- | tee -a /var/log/cron.log Pipe standard out to tee, which is a program that both prints output in the console and saves it in /var/log/cron.log at the same time.
As hinted by /var/log/cron.log, we run this command as a cron job, which simply means that it is executed every night at a specific time. One thing that might catch your attention is that we let gsutil rsync delete files. This might seem like double-edged sword, because if you need to restore data from your backup because of a hard drive failure, you probably want it to look exactly like it did the previous day. On the other hand, you might want to access you backup because you effed something up and then maybe don’t want the latest data, but rather the data from a few days or weeks ago.
As it turns out, there’s a way the both eat the cookie and save it for later, because you can enable versioning on your bucket. Simply run gsutil versioning set on gs://my-cool-bucket-storage and files will never truly be deleted, instead each write or delete on a specific file will be registered as a separate file and is retrievable at any time. You’ll pay for the extra storage this consumes of course, but that is probably a small price to pay if it saves you hours or days of work. If the costs for deleted objects become an issue, you can always change object lifecycle settings. In case you’re wondering, we use a container to run this service.
And there you have it, our backup solution for our hosted services. So let’s look at what we do if things go wrong:
- A disk fails in our NAS: We run RAID 10, so simply replace it
- Multiple disks fail in our NAS/NAS fails: Replace the disks or NAS, sync latest state from Google Cloud Storage. Data which isn’t placed in the “backup” folder might be lost.
- Internal server fails: Replace broken parts or the entire server, re-install CoreOS with cloud-config.yaml settings, restore and reboot containers (since we’re using Dockerfiles, we can also host containers in the cloud should we choose to).
- gsutil fails to upload data: This is a known flaw in our setup, because we don’t trigger an alarm if this tool fails. This has happened, but we’ve fixed the errors that made it fail. Should it happen again we’ll probably put some effort into making it better, but for now, I look at the log file every now and then to see if has copied files successfully.
And for feature or security updates:
- Container services: Rebuild images from Dockerfiles, mount old data from NAS if compatible, otherwise migrate to new format.
- CoreOS/NAS/Workstations: Receives regular updates from Microsoft/CoreOS/NAS vendor.
We feel quite confident with our setup, so hosting internal services (which supports unlimited users and we can integrate with each other any way we’d like) is not a big deal. We still use GSuite for email and document storage and a traditional web hosting service for this webpage, though, as those services should be available to external users.