NAS Performance: NFS vs Samba vs GlusterFS

I came across this question and also found the results of the benchmarks somewhat surprising.

  • GlusterFS replicated 2: 32-35 seconds, high CPU load
  • GlusterFS single: 14-16 seconds, high CPU load
  • GlusterFS + NFS client: 16-19 seconds, high CPU load
  • NFS kernel server + NFS client (sync): 32-36 seconds, very low CPU load
  • NFS kernel server + NFS client (async): 3-4 seconds, very low CPU load
  • Samba: 4-7 seconds, medium CPU load
  • Direct disk: < 1 second

The post is from 2012, so I’m curious if this is still accurate. Has anybody tried this? Can confirm or otherwise?

Also, an interesting note from the answer to the above:

From what I’ve seen after a couple of packet captures, the SMB protocol can be chatty, but the latest version of Samba implements SMB2 which can both issue multiple commands with one packet, and issue multiple commands while waiting for an ACK from the last command to come back. This has vastly improved its speed, at least in my experience, and I know I was shocked the first time I saw the speed difference too – Troubleshooting Network Speeds — The Age Old Inquiry


Amazon EFS preview

Amazon EFS

Amazon Elastic File System, or EFS for short, is the missing piece of the cloud puzzle.  With all those EC2 instances, elastic load balances and IAM roles, one would often need a shared file system.  Until now, you’d either be using either an S3-based solution, which scales well in terms of price and storage, but lacks in common tools support and sometimes in real-time synchronization; or an EBS-based solution, which performs way better (especially with SSD-backed storage) and works like a regular file system, but is a bit more pricey and lacking, being a block-level solution, the sharing option – so you’d have to build something like a GlusterFS solution or an NFS server, both of which have their own issues.

So, the arrival of the EFS, even as a preview for now, will bring joy to many.

Amazon EFS is a new fully-managed service that makes it easy to set up and scale shared file storage in the AWS Cloud. Amazon EFS supports NFSv4, and is designed to be highly available and durable. Amazon EFS can support thousands of concurrent EC2 client connections with consistent performance, making it ideal for a wide range of use cases, including content repositories, development environments, and home directories, as well as big data applications that require on-demand scaling of file system capacity and performance.

(Quote from the webinar pitch)

In terms of integration, it looks easy for the Linux crowd – NFSv4 option is there.  What’s happening in the Windows world, I’m not that aware though.  Gladly, that’s not my problem to worry.

In terms of pricing, this looks a bit expensive.  The calculations are in GB-Months, with the current price being $0.30 per GB-Month.  An example for 150 GB used over the first two weeks of the month and 250 GB sued over the second half of the month, yields a 177 GB-Month average at a cost of $53.10 USD.  Even knowing that EFS is riding on SSD-based hardware and should be quite fast, the price is high.  Amazon is known however for its regular price reductions.

So for now, I’d wait.  It’s good to know that the option is there (or almost there, preview still pending).  But for the masses to jump onto it, it’ll need to calm down its dollar hunger a bit.

πfs – the data-free filesystem!

πfs – the data-free filesystem!

πfs is a revolutionary new file system that, instead of wasting space storing your data on your hard drive, stores your data in π! You’ll never run out of space again – π holds every file that could possibly exist! They said 100% compression was impossible? You’re looking at it!

Scaling the Facebook data warehouse to 300 PB

Scaling the Facebook data warehouse to 300 PB

At Facebook, we have unique storage scalability challenges when it comes to our data warehouse. Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3x growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for our warehouse infrastructure.

36+ Terabytes of free cloud storage

Chinese cloud service offers 36+ TB of free storage (!!!).  The biggest disadvantage here is that the whole website is in Chinese, but apparently there are several translations and guides in other languages available online.  Immediately after the registration you get 7 GB.  Once the desktop client is installed you get another 10 TB.  If you install a mobile client, you get additional 26 TB.  And then you can increase it even further by clicking through ads, promotions, etc.

Via Yuri Timofeev.

Trying out HashBackup with Amazon S3

These days I am once again improving my backup routines.  After I ran out of all reasonable space on my Dropbox account last year, I’ve moved to homemade rsync scripts and offsite backup downloads between my server and my laptop.  Obviously, with my laptop being limited on disk space, and not being always online, the situation was less than ideal.  And finally I grew tired of keeping it all running.

A fresh look around at backup software brought in a new application that I haven’t seen before – HashBackup.  It’s free, it has the simplest installation ever (statically compiled), it runs on every platform I care about and more, and it supports remote storage via pretty much any protocol.  It also features nice backup rotation plans and an interesting way of pushing backups to remote storage with sensible security.

Once I settled with the software, I had to sort out my disk space issue.  Full server backup takes about 15 GB and I want to keep a few of them around (daily, weekly, monthly, yearly, etc).  And I want to keep them off the server.  Not being too enthusiastic about having a home server on all the time, and not having enough space and uptime on my laptop, I’ve decided to check some of those storage solutions in the cloud.  Yeah, I know…

My choice fell upon Amazon S3.  Not for any particular reason either.  They seem to be cheap, fast, reliable and quite popular.  And HashBackup also supports them too.  So I’ve spent a couple of days (nights actually) configuring all to my liking and now I see the backups are running smoothly without any intervention on my end.

Before I will finalize my decision, I want to see the actual Amazon charge.  Their prices seem to be well within my budget, but there are many variables that I might be misinterpreting.   If they will charge what they say they will charge, I might free up much more space across all my computers, I think.

As far as tips go, I have two, if you decide to follow this path:

  1. When configuring HashBackup, you’ll find that documentation on the site is awesome.  However it will keep referring to dest.conf file that you’d use to configure remote destinations.  Example files are not part of online documentation, however, you’ll find a few example files (for each type of remote destination) in the software tarball, in the doc/ folder. 
  2. When configuring Amazon S3, you’d probably be tempted to have a more restrictive access policy then those offered by Amazon.  For instance, you’d probably want to limit access by folder, rather by bucket.  Word of advice: start with Amazon’s police first and make sure everything works.  Only then switch to your own custom policy.  Otherwise, you might spend too much time troubleshooting a wrong issue.