Software Engineering Radio : CAP Theorem

On the way to work today I enjoyed an excellent episode of Software Engineering Radio which featured an interview with Eric Brewer, a VP of Infrastructure at Google,  probably more famous for his CAP Theorem.

In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  • Consistency (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it succeeded or failed)
  • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

The discussion around “2 out of 3” was very thought-provoking and, at first, a little bit counter-intuitive.  If you don’t want to listen to the show, read though this page, which covers the important bits.

The easiest way to understand CAP is to think of two nodes on opposite sides of a partition. Allowing at least one node to update state will cause the nodes to become inconsistent, thus forfeiting C. Likewise, if the choice is to preserve consistency, one side of the partition must act as if it is unavailable, thus forfeiting A. Only when nodes communicate is it possible to preserve both consistency and availability, thereby forfeiting P. The general belief is that for wide-area systems, designers cannot forfeit P and therefore have a difficult choice between C and A. In some sense, the NoSQL movement is about creating choices that focus on availability first and consistency second; databases that adhere to ACID properties (atomicity, consistency, isolation, and durability) do the opposite.

This puts some of the current trends into perspective.

The Cost Of Loss

SingleHop – a cloud-based hosting company – created this infographic on the cost of loss for when your backups aren’t up to the par.  This should work well as a reminder, especially if printed out and hung on the wall in front of a sysadmin (but also somewhere, where the management can occasionally see it too).

SingleHop BaaS Infographic

 

Do Not Use Amazon Linux

I came across “Do Not Use Amazon Linux” opinion on Ex Ratione.  I have to say that I mostly agree with it.  When I initially started using Amazon Web Services, I assumed (due to time constraints mostly) that Amazon Linux was a close derivative of CentOs and I opted for that.  For the majority of things that affect applications in my environment that holds true, however it’s not all as simple as it sounds.

There are in fact differences that have to be taken into account.  Some of the configuration issues can be abstracted with the tools like Puppet (which I do use).  But not all of it.   I’ve been bitten by package names and version differences (hello PHP 5.3, 5.4, and 5.5; and MySQL and MariaDB) between Amazon AMI and CentOS distribution.  It’s an absolute worst when trying to push an application from our testing and development environments into the client’s production environment.  Especially when tight deadlines are involved.

One of the best reasons for CentOS is that developers can easily have their local environments (Vagrant anyone?) setup in an exactly the same way as test and production servers.

Amazon EFS preview

Amazon EFS

Amazon Elastic File System, or EFS for short, is the missing piece of the cloud puzzle.  With all those EC2 instances, elastic load balances and IAM roles, one would often need a shared file system.  Until now, you’d either be using either an S3-based solution, which scales well in terms of price and storage, but lacks in common tools support and sometimes in real-time synchronization; or an EBS-based solution, which performs way better (especially with SSD-backed storage) and works like a regular file system, but is a bit more pricey and lacking, being a block-level solution, the sharing option – so you’d have to build something like a GlusterFS solution or an NFS server, both of which have their own issues.

So, the arrival of the EFS, even as a preview for now, will bring joy to many.

Amazon EFS is a new fully-managed service that makes it easy to set up and scale shared file storage in the AWS Cloud. Amazon EFS supports NFSv4, and is designed to be highly available and durable. Amazon EFS can support thousands of concurrent EC2 client connections with consistent performance, making it ideal for a wide range of use cases, including content repositories, development environments, and home directories, as well as big data applications that require on-demand scaling of file system capacity and performance.

(Quote from the webinar pitch)

In terms of integration, it looks easy for the Linux crowd – NFSv4 option is there.  What’s happening in the Windows world, I’m not that aware though.  Gladly, that’s not my problem to worry.

In terms of pricing, this looks a bit expensive.  The calculations are in GB-Months, with the current price being $0.30 per GB-Month.  An example for 150 GB used over the first two weeks of the month and 250 GB sued over the second half of the month, yields a 177 GB-Month average at a cost of $53.10 USD.  Even knowing that EFS is riding on SSD-based hardware and should be quite fast, the price is high.  Amazon is known however for its regular price reductions.

So for now, I’d wait.  It’s good to know that the option is there (or almost there, preview still pending).  But for the masses to jump onto it, it’ll need to calm down its dollar hunger a bit.