Amazon

Amazon Snowmobile – a truck with up to 100 Petabytes of storage

Back in my college days, I had a professor who frequently used Andrew Tanenbaum‘s quote in the networking class:

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

I guess he wasn’t the only one, as during this year’s Amazon re:Invent 2016 conference, the company announced, among other things, a AWS Snowmobile:

Moving large amounts of on-premises data to the cloud as part of a migration effort is still more challenging than it should be! Even with high-end connections, moving petabytes or exabytes of film vaults, financial records, satellite imagery, or scientific data across the Internet can take years or decades. On the business side, adding new networking or better connectivity to data centers that are scheduled to be decommissioned after a migration is expensive and hard to justify.

[…]

In order to meet the needs of these customers, we are launching Snowmobile today. This secure data truck stores up to 100 PB of data and can help you to move exabytes to AWS in a matter of weeks (you can get more than one if necessary). Designed to meet the needs of our customers in the financial services, media & entertainment, scientific, and other industries, Snowmobile attaches to your network and appears as a local, NFS-mounted volume. You can use your existing backup and archiving tools to fill it up with data destined for Amazon Simple Storage Service (S3) or Amazon Glacier.

Thanks to this VentureBeat page, we even have a picture of the monster:

aws-snowmobile

100 Petabytes on wheels!

I know, I know, it looks like a regular truck with a shipping container on it. But I’m pretty sure it’s VERY different from the inside. With all that storage, networking, power, and cooling needed, it would be awesome to take a pick into this thing.

Top 29 books on Amazon from Hacker News comments

hacker-news-books

I came across this nice visualization of “Top 29 books ranked by unique users linking to Amazon in Hacker News comments“.

Amazon product links were extracted and counted from 8.3M comments posted on Hacker News from Oct 2006 to Oct 2015.

Most of these are, not surprisingly, on programming and design. A few are on startups and business. Some are on how to have a good life. Which is a bit weird.

Support lesson to learn from Amazon AWS

I’ve said a million times how happy I am with Amazon AWS. Today I also want to share a positive lesson to learn from their technical support. It’s the second time I’ve contacted them over the last year and a half, and it’s the second time I am amazed at how good well it works.

In my experience, technical support departments usually rely on one primary communication channel – be that a telephone, an email, a ticketing system, or a live chat. The other channels are often just routed or converted into the main one, or, even, completely ignored. But each one of those has it’s benefits and side effects.

Telephone provides the most immediate connectivity, and a much valued option of the human interaction. But the communication is verbal, often without the paper trail. It makes it difficult to carbon copy (CC) people on the conversation or review exactly what has been said. It is also very free form, unstructured.

Live chat is also free form and unstructured, but it’s written, so transcripts are easily available. It also helps with the carbon copy, but only on the receiving end – supervisors or field experts can often be included in the conversation, but adding somebody from the requesting side is rarely supported.

Email makes it easy to carbon copy people on both ends. It provides the paper trail, but often lacks the immediate response factor. And it’s still unstructured, making it difficult to figure out what was requested, what has been discussed and whether or not there was any resolution. (Have you ever been a part of a lengthy multi-lingual conversation about, what turned out to be, multiple issues in the same thread?)

Ticketing/support systems help to structure the conversation and make it follow a certain workflow. But they often lack humanity and, much like emails, the immediate response.

Now, what Amazon AWS support has done is a beautiful combination of a ticketing system and a phone. You start off with the ticketing system – login, create a new support case, providing all the necessary information, and optionally CC other people from a single short form. The moment you submit it, the web page asks for your phone number. Once entered, a phone call is placed immediately by the system, connecting you to the support engineer. The engineer confirms a few case details and lets you know that the case is in progress and expected resolution time (I was asking to raise the limit of the Elastic IP addresses on the Virtual Private Cloud, and I was told it will be done in the next 15 to 30 minute. And it was done in 10!). I have also received two emails – one confirming the opening of the case, with all the requested details, and another one notifying me that the work has been done, providing quick information on how to follow up, in case I needed to.

Overall experience was very smooth, fast, to the point, and very effective. I never got lost. I never had to figure anything out. And my problem was attended to and resolved immediately.

I only wish more companies provided this level of support. I’ll sure try too – but it’s a bar set high.

Top level domain nonsense and how it can break your stuff

Call me old school, but I really (I mean REALLY) don’t like the recent explosion of the top level domains. I understand that most good names are taken in .com, .org, and .net zones, but do we really need all those .blue, .parts, and .yoga TLDs?

Why am I whining about all this all of a sudden? I’ll tell you why. Because a new top level domain – .aws – is about to be introduced, and it already broke something for me in a non-obvious manner.

I manage a few Virtual Private Clouds on the Amazon AWS. Many of these use and rely on some hostname naming convention (yeah, I’m familiar with the pets vs. cattle idea). Imagine you have a few servers, which are separated into generic infrastructure and client segments, like so:

bastion.aws.example.com
firewall.aws.example.com
lb.aws.example.com
web.client1.example.com
db.client1.example.com
web.client2.example.com
db.client2.example.com
… and so on.

Working with such long FQDNs (fully qualified domain names) isn’t very convenient. So add “search example.com” to your /etc/resolve.conf file and now you can use short hostnames like firewall.aws and web.client1. And life is beautiful …

… until one day, when you see the following:

[email protected]$> ssh firewall.aws
Permission denied (publickey).

And that’s when your heart misses a beat, the world freezes, and you go: “WTF?”. All kinds of thoughts are rushing through your head. Is it a typo? Am I in the right place? Did the server get compromised? How’s that for a little panic …

Trying a few things here and there, you manage to get into the server from somewhere else. You are very careful. You are looking around for any traces of the break-in, but you see nothing. You dig through the logs both on the server and off it. Still nothing. You can dive into all those logwatch and cron messages in your Trash, that you were automatically deleting, cause things were working fine for so long. There! You find that cron was complaining that backup script couldn’t get into this machine. Uh-oh. This was happening for a few days now. A black cloud of combined worry for the compromised machine and outdated backup kills the sunlight in your life. Dammit!

Take a break to calm down. Try to think clearly. Don’t panic. Stop assuming things, and start troubleshooting.

A few minutes later, you establish that the problem is not limited to that particular machine. All your .aws hosts share this headache. A few more minutes later, you learn that ‘ssh firewall.aws.example.com’ works fine, while ‘ssh firewall.aws’ still doesn’t.

That points toward the hostname resolution issue. With that, it takes only a few more moments to see the following:

[email protected]$> host firewall.aws
firewall.aws has address 127.0.53.53
firewall.aws mail is handled by 10 your-dns-needs-immediate-attention.aws.

Say what? That’s not at all what I expected. And what is that that I need to fix with my DNS? Google search brings this beauty:

This is problably because the .dev and .local are now valid top level extensions.

Really? Who’s the genius behind that? I thought people chose those specifically to make them internal. So is there an .aws top level extension now too? You bet there is!

Solution? Well, as far as I am concerned, from this day onward, I don’t trust the brief hostnames anymore. It’s FQDN or nothing.

CPU Steal Time. Now on Amazon EC2

Yesterday I wrote the blog post, trying to figure out what is the CPU steal time and why it occurs. The problem with that post was that I didn’t go deep enough.

I was looking at this issue from the point of view of a generic virtual machine. The case that I had to deal with wasn’t exactly like that. I saw the CPU steal time on the Amazon EC2 instance. Assuming that these were just my neighbors acting up or Amazon having a temporary hardware issue was a wrong conclusion.

That’s because I didn’t know enough about Amazon EC2. Well, I’ve learned a bunch since then, so here’s what I found.

Continue reading CPU Steal Time. Now on Amazon EC2