Backup your data! Unless you are Google

BBC reports that one of the Google data centers experienced a data loss, after a nearby power power facility was struck by lightnings four times in a row.  Only about 0.000001% of total disk space was permanently affected, it is said.

A thing called “backup” immediately comes to mind.  This was something I had to deal with in pretty much every company I worked for as a sysadmin.  Backup your data or lose it, right?

Well, maybe.  For most of those companies a dedicated storage or a couple of tape drives could easily solve the problem.  But Google is often special in one way or the other.

A quick Google search (hehe, yup) for how much data Google stores, brings up this article from last year, linking to this estimation approach – there are no officially published numbers, so the estimate is all we can do: 10-15 exabytes.  (10-15 exabytes, Carl!)  And that’s from the last year.

Using this method, they determined that Google holds somewhere around 10-15 exabytes of data. If you are in the majority of the population that doesn’t know what an exabyte is, no worries. An exabyte equals 1 million terabytes, a figure that may be a bit easier to relate to.

Holy Molly, that’s a lot of data!  To back this up, you’ll need at least double the storage.  And some really lightning-fast (pan intended) technology.  Just to give you an idea, some of the fastest tape drives have a throughput of about 1 TB / hour and a native capacity of about 10 TB (have a look here, for example).  The backup process will take about … forever to complete.

So if tapes are out, then we are backing up onto another storage.  Having the storage in the same data center sort of defeats the purpose (see above regarding “lightning”).  Having a storage in another data center (or centers) means you’ll need some super fast networks.

You could probably do quite a bit of optimization with incremental and differential backups, but you’d still need quite a substantial infrastructure.

Simpler, I guess, just spread your data across many data centers with several copies all over the place, and hope for the best.

But that’s for Google.  For the rest of us, backup is still an option.  (Read some of these horror stories if you are not convinced yet.)

And since we are on the subject of backups, let me ask you this: how are you doing backups?  Are you still with tapes, or local NAS, or, maybe, something cloud-based?  Which software do you use? What’s your strategy?

For me, dealing with mostly small setups, Amazon S3 with HashBackup is sufficient enough.  I don’t even need to rotate the backups anymore. Just do a full daily.

Rank of top languages on GitHub.com over time

GitHub blog shares some trends in regards to programming languages, which includes both public and private repositories:

GitHub programming languages

Interesting.  I haven’t seen many Java and C# projects myself, but I’m in a very different bubble.  PHP stays on #4 for years.  VimL, the language in which most plugins for Vim editor are written, makes it to #10 in 2010, which suggests that there are way more plugins than I ever thought.  The drop in Perl is also quite notable, but not very surprising.

Using Graphviz dot for ERDs, network diagrams and more

I’ve mentioned Graphviz many a time on this blog.  It’s simple to use, yet very powerful.  The dot language is something that can be jotted down by hand in the simplest of all text editors, or generated programmatically.

The official website features a gallery, which demonstrates a wide range of graphs.  But I still wanted to blog a few examples from my recent use.

Continue reading Using Graphviz dot for ERDs, network diagrams and more

Bootstrap 4 alpha release

Bootstrap 4 alpha has been released.  After a few more alphas, and a couple of betas, we’ll have a new and much improved Twitter Bootstrap.  Though it seems like just yesterday I was looking forward to the release of Bootstrap 3.

Can you imagine that Bootstrap is only 4 years old?  It feels like I’ve been using it forever.  And the rest of the Internet seems to agree…

Custom Single Sign-On with Nginx and Auth Request Module

In a recent project I crashed into a wall.  At least for a couple of days that is.  The requirement was to integrate the Request Tracker (aka RT) installation on CentOS 7 server with Nginx to a client’s company single sign-on solution.  Which wasn’t LDAP.  Or Active Directory.  Or anything standard at all – a complete homegrown system.

Continue reading Custom Single Sign-On with Nginx and Auth Request Module