Upgrading Amazon EC2 instance type

By now everybody knows that one of the major benefits to using cloud services rather than hosting on your own hardware is the ease to scale quickly.  Many web applications and large companies benefit from this, but what about smaller customers?  How about a single server?

Well, today one of our web servers was experiencing some pick loads.  It hosts a whole array of small websites built with WordPress, CakePHP, and other popular tools.  There was no time to update all these projects to work with multiple web servers.  And even redeploying them to multiple individual servers would have taken a few hours.  Instead, we’ve decided to upgrade the server hardware.

Pause for a second and imagine the situation with your own server.  Or a dedicated hosting account for that matter.  So much to configure.  So much to backup and restore.  So much to test.

Here’s how to do it, if your projects are on the Amazon EC2 instance (our was also inside a virtual private cloud (VPC), but even if it wasn’t, the difference would be insignificant):

  1. Login to the Amazon AWS console.
  2. Navigate to the Amazon EC2 section.
  3. Click on Instances in the left sidebar.
  4. Click on the instance that you want to upgrade in the list of your instances.
  5. Click Actions -> Instance State -> Stop.
  6. Wait a few seconds for the instance to stop.  You can use the Refresh button to update the list.
  7. (While your instance is still selected in the list of instances:) Click Actions -> Instance Settings -> Change Instance Type.
  8. In the popup window that appeared, select an Instance Type that you want.
  9. Click Apply.
  10. Click Actions -> Instance State -> Start.
  11. Wait a few seconds for the instance to start.
  12. Enjoy!

The whole process literally takes under two minutes.  You get exactly the same configuration – hostname, IP addresses (both internal and external), mounted EBS volumes, all your OS configuration, etc.  It’s practically a reboot of your machine. But into a different hardware configuration (CPU/RAM).

Coincidentally, earlier this morning I had to pack up a rack-mountable server – screws, cables, dusty boxes, the whole shebang.  It’s been a while since I’ve done that last time.

Another day in the office #work #cables #sysadmin

A photo posted by Leonid Mamchenkov (@mamchenkov) on

But I can tell you that I much prefer clicking a few buttons and moving on with my day.  Maybe I’m just not the hardware type.

Hard working #selfie #me #work #office

A photo posted by Leonid Mamchenkov (@mamchenkov) on

 

WTF with Amazon and TCP

Here goes the story of me learning a few new swear words and pulling out nearly all my hair.  Grab a cup of coffee, this will take make a while to tell…

First of all, here is a diagram to make things a little bit more visual.

wtf

As you can see, we have an office network with NAT on the gateway.  We have an Amazon VPC with NAT on the bastion host.  And then there’s the rest of the Internet.

The setup is pretty straight forward.  There are no outgoing firewalls anywhere, no VLANs, no network equipment – all of the involved machines are a variety of Linux boxes.  The whole thing has been working fine for a while now.

A couple of weeks ago we had an issue with our ISP in the office.  The Internet connection was alive, but we were getting extremely high packet loss – around 80%.  The technician passed by, changed the cables, rebooted the ADSL modem, and we’ve also rebooted the gateway.  The problem was fixed, except for one annoying bit.  We could access all of the Internet just fine, except our Amazon VPC bastion host.  Here’s where it gets interesting.

Continue reading “WTF with Amazon and TCP”

Amazon Makes It Almost Impossible To Calculate Their “Virtual CPU” Equivalent

So, it looks like I’m not the only one trying to figure out Amazon EC2 virtual CPU allocation.  Slashdot runs the story (and a heated debate, as usual) on the subject of Amazon’s non-definitive virtual CPUs:

ECU’s were not the simplest approach to describing a virtual CPU, but they at least had a definition attached to them. Operations managers and those responsible for calculating server pricing could use that measure for comparison shopping. But ECUs were dropped as a visible and useful definition without announcement two years ago in favor of a descriptor — virtual CPU — that means, mainly, whatever AWS wants it to mean within a given instance family.

A precise number of ECUs in an instance has become simply a “virtual CPU.”

Amazon EC2 t2.nano instances

If you thought t2.micro was a tiny machine, I have news for you – Amazon announced t2.nano instance type.  It features 512 MB of RAM, 1 vCPU, and up to two Elastic network interfaces.  Price for on-demand instance – $0.0065 per hour.

This instance type is perfect for small websites, developer and testing environments, and other tasks which don’t require a lot of resource.

CPU Steal Time. Now on Amazon EC2

Yesterday I wrote the blog post, trying to figure out what is the CPU steal time and why it occurs.  The problem with that post was that I didn’t go deep enough.

I was looking at this issue from the point of view of a generic virtual machine.  The case that I had to deal with wasn’t exactly like that.  I saw the CPU steal time on the Amazon EC2 instance.  Assuming that these were just my neighbors acting up or Amazon having a temporary hardware issue was a wrong conclusion.

That’s because I didn’t know enough about Amazon EC2.  Well, I’ve learned a bunch since then, so here’s what I found.

Continue reading “CPU Steal Time. Now on Amazon EC2”

How Far Can You Go With HAProxy and a t2.micro

Here’s an interesting set of experiments trying to answer the question of how far can you go with HAProxy setup on the smallest of the Amazon EC2 instances – t2.micro (1 virtual CPU, 1 GB of RAM).  Here’s the summary.

460 requests/second

At 460 req/second response times are mostly a flat ~300 ms, except for two spikes. I attribute this to TCP congestion avoidance as the traffic approaches the limit and packets start to get dropped. After dropped packets are detected the clients reduce their transmission rate, but eventually the transmission rate stabilizes again just under the limit. Only 1739 requests timeout and 134918 succeed.

[…]

It seems that the limit of the t2.micro is around 500 req/second even for small responses.

CPU Steal Time

Here’s something that happens once in a blue moon – you get a server that seems overloaded while doing nothing.  There are several reasons for why that can happen, but today I’m only going to look at one of them.  As it happened to me very recently.

Firstly, if you have any kind of important infrastructure, make sure you have the monitoring tools in place.  Not just the notification kind, like Nagios, but also graphing ones like Zabbix and Munin.  This will help you plenty in times like this.

web1

When you have an issue to solve, you don’t want to be installing monitoring tools, and starting to gather your data.  You want the data to be there already.

Now, for the real thing.  What happened here?  Well, obviously the CPU steal time seems way off.  But what the hell is the CPU steal time?  Here’s a handy article – Understanding the CPU steal time.  And here is my favorite part of it:

There are two possible causes:

  1. You need a larger VM with more CPU resources (you are the problem).
  2. The physical server is over-sold and the virtual machines are aggressively competing for resources (you are not the problem).

The catch: you can’t tell which case your situation falls under by just watching the impacted instance’s CPU metrics.

In our case, it was a physical server issue, which we had no control over.  But it was super helpful to be able to say what is going.  We’ve prepared “plan B”, which was to move to another server, but finally the issue disappeared and we didn’t have to do that this time.

Oh, and if you don’t have those handy monitoring tools, you can use top:

top_steal

P.S. : If you are on Amazon EC2, you might find this article useful as well.

WordPress Benchmark of MySQL server on Amazon EC2

I have a friend who is a newcomer to the world of WordPress.  Until recently, he was mostly working with custom-built systems and a PostgreSQL database engine, so there are many topics to cover.

One of the topics that came up today was the performance of the database engine.  A quick Google search brought up the Benchmark plugin, which we used to compare results from several servers.  (NOTE: you’ll need php-bcmath installed on your server for this plugin to work.)

My friend’s test server showed a rather poor 48 requests / second result.  And that’s on an Intel Core2 Duo E4500 machine with 4 GB of RAM and 160 GB 7200 RPM SATA HDD, running Ubuntu 12.04 x86-64.

So, I tried it on my setup.  My setup is all on Amazon EC2, using the smallest possible t2.micro servers (that’s Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, with 1 GB of RAM and god knows what kind of hard disk, running Amazon AMI).

First, I ran the benchmark on the test server, which hosts about 20 sites with low traffic (I didn’t want to bring up a separate instance for just a single benchmark run).  MySQL runs on the same instance as the web server.  And here are the results:

Your System Industry Average
CPU Speed: 38,825 BogoWips 24,896 BogoWips
Network Transfer Speed: 97.81 Mbps 11.11 Mbps
Database Queries per Second: 425 Queries/Sec 1,279 Queries/Sec

Secondly, I ran the benchmark on one of the live servers, which also hosts about 20 sites with low traffic. Here though, Nginx web server runs on one instance and the MySQL database on another. Here are the results:

Your System Industry Average
CPU Speed: 37,712 BogoWips 24,901 BogoWips
Network Transfer Speed: 133.91 Mbps 11.15 Mbps
Database Queries per Second: 1,338 Queries/Sec 1,279 Queries/Sec

In both cases, MySQL is v5.5.42, running on the /usr/share/doc/mysql55-server-5.5.42/my-huge.cnf configuration file. (I find it ironically pleasing that the tiniest of Amazon EC2 servers fits perfectly for the huge configuration shipped with documentation.)

The benchmark plugin explains how the numbers are calculated. Here’s what it says about the database queries:

To benchmark your database I use your wp_options table which uses the longtext column type which is the same type used by wp_posts. I do 1000 inserts of 50 paragraphs of text, then 1000 selects, 1000 updates and 1000 deletes. I use the time taken to calculate queries per second based on 4000 queries. This is a good indication of how fast your overall DB performance is in a worst case scenario when nothing is cached.

So, it’s a good number to throw around, but it’s far from the realistic site performance, as your WordPress site will mostly get SELECTs, not INSERTs or UPDATEs or DELETEs. And then, you’ll obviously need to see how many SQL queries do you need per page. And then you’ll need to examine all the caching in play – from browser, web server, WordPress, MySQL, and the operating system. And then, and then, and then.

But for a quick measure, I think, this is a good benchmark. It’s obvious that my friend can get a lot more out of his server without digging too deep. It’s obvious that separating web and database server into two Amazon instances gives you quite a boost. And it’s obvious that I don’t know much about performance measuring.

On Amazon EC2 instances

I am staring at the t2.micro (the smallest available instance type) server running MySQL 5.5.40 (using the my-huge.cnf example configuration shipped with MySQL, which ironically matches t2.micro specs).  Here’s why (as reported by Nagios for the last few hours):

Queries per second avg: 12888.839

The number is fluctuating between about 12,500 and 13,500.  Server load is moving between 0.05 and 0.08.

I think this answers the question of whether or not I am happy with the Amazon EC2 instance performance with a “hell yeah!” bang.