Cold Storage

tech, storage, backup, archiving

Let’s say you are a filmmaker and, connected to your computer, you have a hard drive holding all the media files and edits for every film you ever made. How do you protect that data?

The easiest thing to do would be to copy the data to another hard drive, but hard drives are like cars: you have to take them out for a spin every once in a while or they go bad. And if that hard drive is in your house, a natural disaster such as an earthquake or fire will destroy both copies. You could make multiple copies and keep rotating them, but that is a fair amount of work.

Another option is to use a medium designed for long-term archiving. One of these is M-DISC, an archival-quality storage solution that preserves photos, videos, music, and documents for 1,000 years or more.” You can burn to M-Disc with many standard Blu-ray drives, but the discs themselves are small compared to what a hard drive can hold, so you would have to use special software to split the files across numerous discs. Then you will need software to catalog what files are on which disc, since you can’t easily search them all at once.

Backing up to the cloud is the obvious solution here, but if you are talking about more than 10 TB of data, it can get quite expensive. However, there are several cloud solutions available.

One is to create your own cloud by setting up a Network-attached storage (NAS) device at home or at work. The advantage of a NAS over a simple external hard drive is that it can be set up as a RAID, which means there is some data duplication and thus greater data security. It can also be located at another location, besides your house, giving you a kind of personal cloud. If you are technically savvy and have a second location where you can install the NAS that has fast, reliable, internet, this might be the most cost-effective solution.

Another option is a cloud backup service. My two favorite ones are Arq and Backblaze. It can be difficult to figure out all the various options and pricing for such services. Backblaze’s Computer Backup” option is one of the simplest and most cost-effective options: for $99 a year, it will backup your entire laptop, as well as one USB-attached drive. In an emergency, they will even mail you a USB drive with all your data, so you don’t necessarily need to download the entire backup over the web. This is perfect for a small independent filmmaker, and is the solution we are currently using. (We use Arq, which can be purchased for a one-time fee, for smaller backup tasks where we save projects to Google Drive or Dropbox.)

But when you are dealing with terabytes of data, using Backblaze can be a bit cumbersome. You need everything on a single external drive, which can be limiting. Also, you need it to be constantly connected to your computer and the internet. (Though you can pay an extra fee to keep the data longer in such situations.) Thus, to be a little more flexible, we were looking for a way to offload some of our data to a longer-term cloud backup solution.

This would just be for peace of mind. We would have a local hard drive backup that would be the first line of defense, but if that failed, we could always get our stuff out of the cloud. It would be for archived projects that are not currently being accessed or updated, so there is no need for Backblaze’s constant updating. And the hope is that we will never need to use it, so fast restore isn’t a big concern either. The biggest concerns are just reliability and cost.

After some research, we found what seems to be the perfect solution. (Though, as of the time of writing, we haven’t actually implemented this yet.) It is using the above-mentioned Arq backup app together with Amazon’s S3 Glacier Deep Archive storage class, which is designed for long-term storage that is rarely accessed. (Instructions for setting up Arq with Glacier can be found here.) It has the cheapest rates we could find, though Google Cloud Storage with their archive” storage tier is roughly competitive. Both are about $1.20/TB per month for storage, plus some additional fees to upload your data. The catch is that it can cost hundreds of dollars per TB to retrieve that data if you ever need to restore from the cloud. It is also slow, as it can take a few hours before Arq has access to the backup to restore it. Despite this, it does seem like the best solution for those who want the peace of mind of knowing their data is safely stored in the cloud.

(Thanks to Arq support and users on r/DataHorder for helping with this research.)

UPDATE: Comment from my friend Lukhnos:

It may still be good to review and audit every 3–5 years what you have stored in those Big Cloud cold storages. Their offerings may change. AWS Glacier is the old offering and uses a totally inhumane API, whereas the confusingly-named S3 Glacier is just a storage class in S3. Google Cloud’s Archive Storage was introduced quite long (IIRC 2–3 years) after Coldline. There will be price changes down the road. When that happens there may be an incentive for one to migrate (though obviously the Big Clouds will have no incentives to help you with that).

I’m betting that Big Clouds have enough big customers that they don’t mess up the data stored with them, but I’m going to retrieve 1–2 GB of data from my long-term archives every few years if only to ensure that I still have the knowledge, the right version of tools, and the right and up-to-date incantations to summon the past bits to reconstitute in front of me, especially since it’s so easy to upload but often so non-trivial to retrieve.