The UniSuper/Google Lesson: Cloud is Not a Backup!
It has been reported that Google Cloud irrecoverably deleted the account of UniSuper, a $135 billion Australian pension fund. This happened when its account was wiped out due to a technical error on Google’s part. At the time, UniSuper indicated it had lost everything it had stored with Google, even its backups, and that caused two weeks of downtime for its 647,000 members. They were able to recover because their multi cloud strategy included data backup off the Google Cloud to another service [1, 2].
Key Lessons
3. Data can Vanish from a Cloud Provider
Previously on this blog, I have written about the critical role of backups for dealing with the ransomware risk. Specifically mentioning the implementation of a 3-2-1 backup strategy. When Google vaporized UniSuper’s data, the only reason recovery was possible was their off-the-Google-cloud backup! Let’s dig deeper.
If this near catastrophe can happen to a BILLION (with a B) dollar fund, for whom the VP of Google Cloud answers their call rather than just customer service, it can happen to any of us and for more mundane reasons. Also do not forget that not even this super level of customer service was unable to restore the vanished data!
For example, your cloud data could vanish for many reasons, including:
- Your cloud provider could make the same error that Google did
- You could accidentally delete your resources when logged into your account
- An unauthorized person could gain access to your root account and delete the data
- Ransomware attack
- You forget to pay your bill long enough that your data is deleted
2. Cloud is Redundancy (like RAID) Not a Reliable Backup
The reason the data is deleted is not the important part. Much as old idea that RAID (Redundant Array of Independent Disks) is not a backup, keeping all your files and backups on a single cloud provider under one account is not a robust backup.
As an aside, on this blog I also previously wrote about my own - much more mundane - complete data loss on my MacBook Pro where the SSD totally failed without warning and not a single byte was recoverable that was not already backed up.
1. Backup to Other Locations via a 3-2-1 Strategy
The 3-2-1 backup strategy is not new but it is often forgotten by many firms that utilize cloud services. With the shared responsibility model, we like to think that our providers are taking care of the data redundancy and backup, but it is not assured.
3-2-1 is a timeless foundational guideline that can inform how to handle backups effectively. The strategy as formulated by Krogh [3] is:
- You should keep 3 copies of any important file (a primary and two backups).
- You should have the files on 2 different media types (such as a hard drive and optical media), to protect against different types of hazards.
- 1 copy should be stored offsite (or at least offline).
Each of these steps is important for protecting the availability of your data. The primary should be on fast and reliable media, such as the active storage provided by your cloud provider. The first backup protects against data loss or deletion on the primary storage. The second backup protects against loss of your primary backup (as happened to UniSuper when Google deleted the data and its backups!).
What might this look like for a cloud provider?
Suppose that you have data stored in a Google Cloud Bucket. This service stores files. You could set up a nightly process on a Linux-based server to use a combination of command line tools provided by Google and AWS to sync the files such that you have:
- 1st copy is the Cloud Bucket hosted files and the redundancy
- 2nd copy could be a clone of the Cloud Bucket to a different region within the Google Cloud
- 3rd off site copy could be the clone of the Cloud Bucket hosted files to a AWS S3 bucket on the Amazon cloud
Suppose you have a SQL database hosted on the Amazon Web Services (AWS) cloud, you have:
- 1st copy is the AWS provided backup of the RDS database
- 2nd copy can be a SQL backup (called a SQL dump) and store it in an AWS S3 bucket that is prepared by a nightly process you set up
- 3rd off site copy could be a clone of the AWS S3 hosted files to a Google Cloud bucket
Non-cloud Off Site Backup with a USB Hard Drive and a Safe
And the complexity really depends on the size of the company and your data. For the vast majority of small and medium size businesses, the data is not large, it is just mission critical. In addition to cloud-to-cloud backup strategies it is reasonable to consider your own off cloud backup as part of your disaster recovery planning.
A basic approach is to have two large USB external hard drives - commercially available drives that can hold terabytes are affordable options that can be easily found. You should make sure that these drives are formatted with software-based encryption such that it requires a complex password to read.
Have a periodic manual process where you use the cloud provider command line tools (aws s3 sync
for Amazon, gsutil cp
for Google Cloud, etc) to sync your bucket data to your local hard drive.
Then you can physically keep this protected drive in a locked safe in your office or obtain a very affordable, possibly free, safe deposit box at your bank to store the offline backup. With two drives, you can make a new backup and then physically swap the unit that is stored off site in the bank vault. Do remember that humidity can damage drives, so storing these in an air conditioned space is going to be ideal.
Do not forget that regularly should be regularly. These are your last ditch backups. Model what a loss of a week or a month of activity will do to your business. This model informs how often the backups should be taken. If you have rarely changing data, the time between backups could be longer. Or if you know a critical process runs on the 5th of each month, you may decide to back up directly after that. This should be part of your comprehensive diaster recovery planning.
In summary
All of us who have businesses critical systems in the cloud can learn from the UniSuper/Google incident. It is shocking to think about what would have happened if they had not been able to restore the trading data in such a massive retirement fund.
While most of us do not face a $135 billion dollar loss, we should all be diligent in our disaster recovery planning and ensure that our backups are not dependent upon a single cloud provider or master credential when possible.
And finally, please test your backups periodically and document the process. The worse time to discover that your backup is unavailable or corrupted is when disaster strikes and it is your last hope of recovery. Data does not do you any good if you cannot get it where you need it, and if you are down to your third level backup, having a guide will help alleviate stress and give a better chance of a successful restoration.
References
- A joint statement from UniSuper CEO Peter Chun, and Google Cloud CEO, Thomas Kurian, unisuper.com.au [Retrieved 2024-06-03]
- Google Cloud explains how it accidentally deleted a customer account , arstechnica.com. [Retrieved 2024-06-03].
- Krogh, Peter, The DAM Book 2nd Edition, page 207. [Citation retrieved from Google Book copy; 2024-06-03].