BBC reports that one of the Google data centers experienced a data loss, after a nearby power power facility was struck by lightnings four times in a row. Â Only about 0.000001% of total disk space was permanently affected, it is said.
A thing called “backup” immediately comes to mind. Â This was something I had to deal with in pretty much every company I worked for as a sysadmin. Â Backup your data or lose it, right?
Well, maybe. Â For most of those companies a dedicated storage or a couple of tape drives could easily solve the problem. Â But Google is often special in one way or the other.
A quick Google search (hehe, yup) for how much data Google stores, brings up this article from last year, linking to this estimation approach – there are no officially published numbers, so the estimate is all we can do: 10-15 exabytes. Â (10-15 exabytes, Carl!) Â And that’s from the last year.
Using this method, they determined that Google holds somewhere around 10-15 exabytes of data. If you are in the majority of the population that doesn’t know what an exabyte is, no worries. An exabyte equals 1 million terabytes, a figure that may be a bit easier to relate to.
Holy Molly, that’s a lot of data! Â To back this up, you’ll need at least double the storage. Â And some really lightning-fast (pan intended) technology. Â Just to give you an idea, some of the fastest tape drives have a throughput of about 1 TB / hour and a native capacity of about 10 TB (have a look here, for example). Â The backup process will take about … forever to complete.
So if tapes are out, then we are backing up onto another storage. Â Having the storage in the same data center sort of defeats the purpose (see above regarding “lightning”). Â Having a storage in another data center (or centers) means you’ll need some super fast networks.
You could probably do quite a bit of optimization with incremental and differential backups, but you’d still need quite a substantial infrastructure.
Simpler, I guess, just spread your data across many data centers with several copies all over the place, and hope for the best.
But that’s for Google. Â For the rest of us, backup is still an option. Â (Read some of these horror stories if you are not convinced yet.)
And since we are on the subject of backups, let me ask you this: how are you doing backups? Â Are you still with tapes, or local NAS, or, maybe, something cloud-based? Â Which software do you use? What’s your strategy?
For me, dealing with mostly small setups, Amazon S3 with HashBackup is sufficient enough. Â I don’t even need to rotate the backups anymore. Just do a full daily.