Getting the best performance out of Amazon EFS

Jeff Geerling shares his tips for “Getting the best performance out of Amazon EFS”.  Given how (still) new the Amazon EFS is and how limited is the documentation of the best practices, this stuff is golden.

tl;dr: EFS is NFS. Networked file systems have inherent tradeoffs over local filesystem access—EFS doesn’t change that. Don’t expect the moon, benchmark and monitor it, and you’ll do fine.

Amazon Elastic File System

Here are some great news from the Amazon AWS blog – the announcement of the Elastic File System (EFS):

EFS lets you create POSIX-compliant file systems and attach them to one or more of your EC2 instances via NFS. The file system grows and shrinks as necessary (there’s no fixed upper limit and you can grow to petabyte scale) and you don’t pre-provision storage space or bandwidth. You pay only for the storage that you use.

EFS protects your data by storing copies of your files, directories, links, and metadata in multiple Availability Zones.

In order to provide the performance needed to support large file systems accessed by multiple clients simultaneously,Elastic File System performance scales with storage (I’ll say more about this later).

I think this might have been the most requested feature/service from Amazon AWS since EC2 launch.  Sure, one could have built an NFS file server before, but with the variety of storage options, availability zones, and the dynamic nature of the cloud setup itself, that was quite a challenge.  Now – all that and more in just a few clicks.

Thank you Amazon!

 

NAS Performance: NFS vs Samba vs GlusterFS

I came across this question and also found the results of the benchmarks somewhat surprising.

  • GlusterFS replicated 2: 32-35 seconds, high CPU load
  • GlusterFS single: 14-16 seconds, high CPU load
  • GlusterFS + NFS client: 16-19 seconds, high CPU load
  • NFS kernel server + NFS client (sync): 32-36 seconds, very low CPU load
  • NFS kernel server + NFS client (async): 3-4 seconds, very low CPU load
  • Samba: 4-7 seconds, medium CPU load
  • Direct disk: < 1 second

The post is from 2012, so I’m curious if this is still accurate. Has anybody tried this? Can confirm or otherwise?

Also, an interesting note from the answer to the above:

From what I’ve seen after a couple of packet captures, the SMB protocol can be chatty, but the latest version of Samba implements SMB2 which can both issue multiple commands with one packet, and issue multiple commands while waiting for an ACK from the last command to come back. This has vastly improved its speed, at least in my experience, and I know I was shocked the first time I saw the speed difference too – Troubleshooting Network Speeds — The Age Old Inquiry