Mcrouter: a memcached protocol router

Mcrouter is an Open Source tool developed by Facebook for scaling up the memcached deployments:

Mcrouter is a memcached protocol router for scaling memcached ( deployments. It’s a core component of cache infrastructure at Facebook and Instagram where mcrouter handles almost 5 billion requests per second at peak.

Here is a good overview of some of the scenarios where Mcrouter is useful.  There’s more than one.  Here are some of the features to get you started:

  • Memcached ASCII protocol
  • Connection pooling
  • Multiple hashing schemes
  • Prefix routing
  • Replicated pools
  • Production traffic shadowing
  • Online reconfiguration
  • Flexible routing
  • Destination health monitoring/automatic failover
  • Cold cache warm up
  • Broadcast operations
  • Reliable delete stream
  • Multi-cluster support
  • Rich stats and debug commands
  • Quality of service
  • Large values
  • Multi-level caches
  • IPv6 support
  • SSL support

Latency numbers by year

Last year I came across a nice chart of latency numbers every programmer should know.  Today, I saw this page, which shows you the same latency numbers, but also provides a timeline from 1990 to 2020.

For some operations, latency is constant, because it’s based on things of nature – speed of light, distance between continents, etc.  For other operations, latency can be decreased through better technology and algorithms.

The timeline clearly shows the mind-blowing advance we’ve experienced in technology over the last three decades.

CPU Steal Time. Now on Amazon EC2

Yesterday I wrote the blog post, trying to figure out what is the CPU steal time and why it occurs.  The problem with that post was that I didn’t go deep enough.

I was looking at this issue from the point of view of a generic virtual machine.  The case that I had to deal with wasn’t exactly like that.  I saw the CPU steal time on the Amazon EC2 instance.  Assuming that these were just my neighbors acting up or Amazon having a temporary hardware issue was a wrong conclusion.

That’s because I didn’t know enough about Amazon EC2.  Well, I’ve learned a bunch since then, so here’s what I found.

Continue reading “CPU Steal Time. Now on Amazon EC2”

NAS Performance: NFS vs Samba vs GlusterFS

I came across this question and also found the results of the benchmarks somewhat surprising.

  • GlusterFS replicated 2: 32-35 seconds, high CPU load
  • GlusterFS single: 14-16 seconds, high CPU load
  • GlusterFS + NFS client: 16-19 seconds, high CPU load
  • NFS kernel server + NFS client (sync): 32-36 seconds, very low CPU load
  • NFS kernel server + NFS client (async): 3-4 seconds, very low CPU load
  • Samba: 4-7 seconds, medium CPU load
  • Direct disk: < 1 second

The post is from 2012, so I’m curious if this is still accurate. Has anybody tried this? Can confirm or otherwise?

Also, an interesting note from the answer to the above:

From what I’ve seen after a couple of packet captures, the SMB protocol can be chatty, but the latest version of Samba implements SMB2 which can both issue multiple commands with one packet, and issue multiple commands while waiting for an ACK from the last command to come back. This has vastly improved its speed, at least in my experience, and I know I was shocked the first time I saw the speed difference too – Troubleshooting Network Speeds — The Age Old Inquiry


How Far Can You Go With HAProxy and a t2.micro

Here’s an interesting set of experiments trying to answer the question of how far can you go with HAProxy setup on the smallest of the Amazon EC2 instances – t2.micro (1 virtual CPU, 1 GB of RAM).  Here’s the summary.

460 requests/second

At 460 req/second response times are mostly a flat ~300 ms, except for two spikes. I attribute this to TCP congestion avoidance as the traffic approaches the limit and packets start to get dropped. After dropped packets are detected the clients reduce their transmission rate, but eventually the transmission rate stabilizes again just under the limit. Only 1739 requests timeout and 134918 succeed.


It seems that the limit of the t2.micro is around 500 req/second even for small responses.

CPU Steal Time

Here’s something that happens once in a blue moon – you get a server that seems overloaded while doing nothing.  There are several reasons for why that can happen, but today I’m only going to look at one of them.  As it happened to me very recently.

Firstly, if you have any kind of important infrastructure, make sure you have the monitoring tools in place.  Not just the notification kind, like Nagios, but also graphing ones like Zabbix and Munin.  This will help you plenty in times like this.


When you have an issue to solve, you don’t want to be installing monitoring tools, and starting to gather your data.  You want the data to be there already.

Now, for the real thing.  What happened here?  Well, obviously the CPU steal time seems way off.  But what the hell is the CPU steal time?  Here’s a handy article – Understanding the CPU steal time.  And here is my favorite part of it:

There are two possible causes:

  1. You need a larger VM with more CPU resources (you are the problem).
  2. The physical server is over-sold and the virtual machines are aggressively competing for resources (you are not the problem).

The catch: you can’t tell which case your situation falls under by just watching the impacted instance’s CPU metrics.

In our case, it was a physical server issue, which we had no control over.  But it was super helpful to be able to say what is going.  We’ve prepared “plan B”, which was to move to another server, but finally the issue disappeared and we didn’t have to do that this time.

Oh, and if you don’t have those handy monitoring tools, you can use top:


P.S. : If you are on Amazon EC2, you might find this article useful as well.

APC is dead, long live OPcache

Since this is probably common knowledge by now, this blog post is more a note to my future self.  APC is dead.  Don’t use it.  Use OPcache instead.  APCu is something else.

In the last few years I’ve had so much issues with APC, that I eventually stopped installing it on my servers by default.  Now that I need to squeeze every bit of performance for one of the projects, I looked back at it.  And tried it.  And once again it kicked me in the balls.  Then I remembered that I’ve seen APCu somewhere.  Maybe it’s a newer fork or something.

Gladly, after a quick Google search for the difference, I came across this discussion, which clarified a few things.

So out of those you named:

  • APC is opcode cache and data store
  • APCu is only data store
  • OPcache is only opcode cache

Since APC is older, at the moment you likely want OPcache as well as some data store, not necessarily APCu (although it is perfectly fine choice).

My interest was in opcode cache, since I already had a data store.  Installing and configuring OPcache needed just a few seconds, and didn’t cause any issues so far.

And if you want more information about it, here is a useful article, which, among other things, lists the helpful tools for monitoring and tweaking OPcache configuration.

3. How to check if OpCache is actually caching my files?

If you have already installed and configured OpCache, you may find it important to control which PHP files are actually being cached. The whole cache engine works in the background and is transparent to a visitor or a web developer. In order to check its status, you may use one of the two functions that provide such information: opcache_get_configuration() and opcache_get_status(). Fortunately, there is a couple of prepared scrips that fetch all the OpCache configuration and status data and display it in a friendly way. You don’t need to write any code by yourself, just pick up one of tools from these below:
Opcache Control Panel,
opcache-status by Rasmus Lerdorf,
OpCacheGUI by Pieter Hordijk,
opcache-gui by Andrew Collington.

May the Cache be with you.

Linux Performance Analysis in 60,000 Milliseconds

Netflix shares the process they use to quickly assess performance issues on a Linux machine.  With these commands you can get a good idea of what’s going on within about 60 seconds:

dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1

Read the rest of the article for details and explanations.

Latency Numbers Every Programmer Should Know


I’m saving this here for current and future generations of programmers:

Latency Comparison Numbers
L1 cache reference                            0.5 ns
Branch mispredict                             5   ns
L2 cache reference                            7   ns             14x L1 cache
Mutex lock/unlock                            25   ns
Main memory reference                       100   ns             20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy              3,000   ns
Send 1K bytes over 1 Gbps network        10,000   ns    0.01 ms
Read 4K randomly from SSD*              150,000   ns    0.15 ms
Read 1 MB sequentially from memory      250,000   ns    0.25 ms
Round trip within same datacenter       500,000   ns    0.5  ms
Read 1 MB sequentially from SSD*      1,000,000   ns    1    ms  4X memory
Disk seek                            10,000,000   ns   10    ms  20x datacenter roundtrip
Read 1 MB sequentially from disk     20,000,000   ns   20    ms  80x memory, 20X SSD
Send packet CA->Netherlands->CA     150,000,000   ns  150    ms

1 ns = 10-9 seconds
1 ms = 10-3 seconds
* Assuming ~1GB/sec SSD

By Jeff Dean:     
Originally by Peter Norvig:

Some updates from:            
Great 'humanized' comparison version:
Visual comparison chart:      
Nice animated presentation of the data:

This is a copy-paste of this gist, referenced from this blog post. Read and share both, for the better world.

WordPress Benchmark of MySQL server on Amazon EC2

I have a friend who is a newcomer to the world of WordPress.  Until recently, he was mostly working with custom-built systems and a PostgreSQL database engine, so there are many topics to cover.

One of the topics that came up today was the performance of the database engine.  A quick Google search brought up the Benchmark plugin, which we used to compare results from several servers.  (NOTE: you’ll need php-bcmath installed on your server for this plugin to work.)

My friend’s test server showed a rather poor 48 requests / second result.  And that’s on an Intel Core2 Duo E4500 machine with 4 GB of RAM and 160 GB 7200 RPM SATA HDD, running Ubuntu 12.04 x86-64.

So, I tried it on my setup.  My setup is all on Amazon EC2, using the smallest possible t2.micro servers (that’s Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, with 1 GB of RAM and god knows what kind of hard disk, running Amazon AMI).

First, I ran the benchmark on the test server, which hosts about 20 sites with low traffic (I didn’t want to bring up a separate instance for just a single benchmark run).  MySQL runs on the same instance as the web server.  And here are the results:

Your System Industry Average
CPU Speed: 38,825 BogoWips 24,896 BogoWips
Network Transfer Speed: 97.81 Mbps 11.11 Mbps
Database Queries per Second: 425 Queries/Sec 1,279 Queries/Sec

Secondly, I ran the benchmark on one of the live servers, which also hosts about 20 sites with low traffic. Here though, Nginx web server runs on one instance and the MySQL database on another. Here are the results:

Your System Industry Average
CPU Speed: 37,712 BogoWips 24,901 BogoWips
Network Transfer Speed: 133.91 Mbps 11.15 Mbps
Database Queries per Second: 1,338 Queries/Sec 1,279 Queries/Sec

In both cases, MySQL is v5.5.42, running on the /usr/share/doc/mysql55-server-5.5.42/my-huge.cnf configuration file. (I find it ironically pleasing that the tiniest of Amazon EC2 servers fits perfectly for the huge configuration shipped with documentation.)

The benchmark plugin explains how the numbers are calculated. Here’s what it says about the database queries:

To benchmark your database I use your wp_options table which uses the longtext column type which is the same type used by wp_posts. I do 1000 inserts of 50 paragraphs of text, then 1000 selects, 1000 updates and 1000 deletes. I use the time taken to calculate queries per second based on 4000 queries. This is a good indication of how fast your overall DB performance is in a worst case scenario when nothing is cached.

So, it’s a good number to throw around, but it’s far from the realistic site performance, as your WordPress site will mostly get SELECTs, not INSERTs or UPDATEs or DELETEs. And then, you’ll obviously need to see how many SQL queries do you need per page. And then you’ll need to examine all the caching in play – from browser, web server, WordPress, MySQL, and the operating system. And then, and then, and then.

But for a quick measure, I think, this is a good benchmark. It’s obvious that my friend can get a lot more out of his server without digging too deep. It’s obvious that separating web and database server into two Amazon instances gives you quite a boost. And it’s obvious that I don’t know much about performance measuring.