NAS Performance: NFS vs Samba vs GlusterFS

I came across this question and also found the results of the benchmarks somewhat surprising.

  • GlusterFS replicated 2: 32-35 seconds, high CPU load
  • GlusterFS single: 14-16 seconds, high CPU load
  • GlusterFS + NFS client: 16-19 seconds, high CPU load
  • NFS kernel server + NFS client (sync): 32-36 seconds, very low CPU load
  • NFS kernel server + NFS client (async): 3-4 seconds, very low CPU load
  • Samba: 4-7 seconds, medium CPU load
  • Direct disk: < 1 second

The post is from 2012, so I’m curious if this is still accurate. Has anybody tried this? Can confirm or otherwise?

Also, an interesting note from the answer to the above:

From what I’ve seen after a couple of packet captures, the SMB protocol can be chatty, but the latest version of Samba implements SMB2 which can both issue multiple commands with one packet, and issue multiple commands while waiting for an ACK from the last command to come back. This has vastly improved its speed, at least in my experience, and I know I was shocked the first time I saw the speed difference too – Troubleshooting Network Speeds — The Age Old Inquiry

 

Amazon EFS preview

Amazon EFS

Amazon Elastic File System, or EFS for short, is the missing piece of the cloud puzzle.  With all those EC2 instances, elastic load balances and IAM roles, one would often need a shared file system.  Until now, you’d either be using either an S3-based solution, which scales well in terms of price and storage, but lacks in common tools support and sometimes in real-time synchronization; or an EBS-based solution, which performs way better (especially with SSD-backed storage) and works like a regular file system, but is a bit more pricey and lacking, being a block-level solution, the sharing option – so you’d have to build something like a GlusterFS solution or an NFS server, both of which have their own issues.

So, the arrival of the EFS, even as a preview for now, will bring joy to many.

Amazon EFS is a new fully-managed service that makes it easy to set up and scale shared file storage in the AWS Cloud. Amazon EFS supports NFSv4, and is designed to be highly available and durable. Amazon EFS can support thousands of concurrent EC2 client connections with consistent performance, making it ideal for a wide range of use cases, including content repositories, development environments, and home directories, as well as big data applications that require on-demand scaling of file system capacity and performance.

(Quote from the webinar pitch)

In terms of integration, it looks easy for the Linux crowd – NFSv4 option is there.  What’s happening in the Windows world, I’m not that aware though.  Gladly, that’s not my problem to worry.

In terms of pricing, this looks a bit expensive.  The calculations are in GB-Months, with the current price being $0.30 per GB-Month.  An example for 150 GB used over the first two weeks of the month and 250 GB sued over the second half of the month, yields a 177 GB-Month average at a cost of $53.10 USD.  Even knowing that EFS is riding on SSD-based hardware and should be quite fast, the price is high.  Amazon is known however for its regular price reductions.

So for now, I’d wait.  It’s good to know that the option is there (or almost there, preview still pending).  But for the masses to jump onto it, it’ll need to calm down its dollar hunger a bit.

Ï€fs – the data-free filesystem!

Ï€fs – the data-free filesystem!

Ï€fs is a revolutionary new file system that, instead of wasting space storing your data on your hard drive, stores your data in Ï€! You’ll never run out of space again – Ï€ holds every file that could possibly exist! They said 100% compression was impossible? You’re looking at it!

Scaling the Facebook data warehouse to 300 PB

Scaling the Facebook data warehouse to 300 PB

At Facebook, we have unique storage scalability challenges when it comes to our data warehouse. Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3x growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for our warehouse infrastructure.