Top 13 Amazon Virtual Private Cloud (VPC) Best Practices

Cloud Academy Blog goes over top 13 Amazon VPC best practices – particularly good for those just starting up with the platform.  The article discusses the following:

  1. Choosing the Proper VPC Configuration for Your Organization’s Needs
  2. Choosing a CIDR Block for Your VPC Implementation
  3. Isolating Your VPC Environments
  4. Securing Your Amazon VPC Implementation
  5. Creating Your Disaster Recovery Plan
  6. Traffic Control and Security
  7. Keep your Data Close
  8. VPC Peering
  9. EIP – Just In Case
  10. NAT Instances
  11. Determining the NAT Instance Type
  12. IAM for Your Amazon VPC Infrastructure
  13. ELB on Amazon VPC

Overall, it’s a very handy quick list.

“AWS Week in Review” goes open

I’ve been a big fan of Amazon AWS for over two years now.  One thing that absolutely blows me away is how much activity there is in Amazon AWS development.  Every day there is an announcement of a new services or updates to the existing ones.  In order to help people keep up with all the updates, Jeff Barr of Amazon was blogging “AWS Week in Review” for a few years.

First "Week in Review"

Now, imagine this – there is so much new stuff going on that it takes hours to prepare each of those blog posts:

Unfortunately, finding, saving, and filtering links, and then generating these posts grew to take a substantial amount of time. I reluctantly stopped writing new posts early this year after spending about 4 hours on the post for the week of April 25th.

This is insane!  So he almost gave up on the idea, as it is too time consuming.  But people want it.  What’s the solution?  Go Open Source!

The AWS Week in Review is now a GitHub project (https://github.com/aws/aws-week-in-review). I am inviting contributors (AWS fans, users, bloggers, and partners) to contribute.

Every Monday morning I will review and accept pull requests for the previous week, aiming to publish the Week in Review by 10 AM PT. In order to keep the posts focused and highly valuable, I will approve pull requests only if they meet our guidelines for style and content.

At that time I will also create a file for the week to come, so that you can populate it as you discover new and relevant content.

I think that’s a brilliant move.  Those weekly review posts are super useful for anyone involved with Amazon AWS.  They should keep coming.  But the time cost involved is understandable.  So crowd-sourcing this is a smart way to go about it.

I hope this will not only continue the blog post series, but also take it to the new level, with more section, content, and insight.

Well done!

Setting up NAT on Amazon AWS

When it comes to Amazon AWS, there are a few options for configuring Network Address Translation (NAT).  Here is a brief overview.

NAT Gateway

NAT Gateway is a configuration very similar to Internet Gateway.  My understanding is that the only major difference between the NAT Gateway and the Internet Gateway is that you have the control over the external public IP address of the NAT Gateway.  That’ll be one of your allocated Elastic IPs (EIPs).  This option is the simplest out of the three that I considered.  If you need plain and simple NAT – than that’s a good one to go for.

NAT Instance

NAT Instance is a special purpose EC2 instance, which is configured to do NAT out of the box.  If you need anything on top of plain NAT (like load balancing, or detailed traffic monitoring, or firewalls), but don’t have enough confidence in your network and system administration skills, this is a good option to choose.

Custom Setup

If you are the Do It Yourself guy, this option is for you.   But it can get tricky.  Here are a few things that I went through, learnt and suffered through, so that you don’t have to (or future me, for that matter).

Let’s start from the beginning.  You’ve created your own Virtual Private Cloud (VPC).  In that cloud, you’ve created two subnets – Public and Private (I’ll use this for example, and will come back to what happens with more).  Both of these subnets use the same routing table with the Internet Gateway.  Now you’ve launched an EC2 instance into your Public subnet and assigned it a private IP address.  This will be your NAT instance.  You’ve also launched another instance into the Private subnet, which will be your test client.  So far so good.

This instance will be used for translating internal IP addresses from the Private subnet to the external public IP address.  So, we, obviously, need an external IP address.  Let’s allocate an Elastic IP and associate it with the EC2 instance.  Easy peasy.

Now, we’ll need to create another routing table, using our NAT instance as the default gateway.  Once created, this routing table should be associated with our Private subnet.  This will cause all the machines on that network to use the NAT instance for any external communications.

Let’s do a quick side track here – security.  There are three levels that you should keep in mind here:

  • Network ACLs.  These are Amazon AWS access control lists, which control the traffic allowed in and out of the networks (such as our Public and Private subnets).  If the Network ACL prevents certain traffic, you won’t be able to reach the host, irrelevant of the host security configuration.  So, for the sake of the example, let’s allow all traffic in and out of both the Public and Private networks.  You can adjust it once your NAT is working.
  • Security Groups.  These are Amazon AWS permissions which control what type of traffic is allowed in or out of the network interface.  This is slightly confusing for hosts with the single interface, but super useful for machines with multiple network interfaces, especially if those interfaces are transferred between instances.  Create a single Security Group (for now, you can adjust this later), which will allow any traffic in from your VPC range of IPs, and any outgoing traffic.  Assign this Security Group to both EC2 instances.
  • Host firewall.  Chances are, you are using a modern Linux distribution for your NAT host.  This means that there is probably an iptables service running with some default configuration, which might prevent certain access.  I’m not going to suggest to disable it, especially on the machine facing the public Internet.  But just keep it in mind, and at the very least allow the ICMP protocol, if not from everywhere, then at least from your VPC IP range.

Now, on to the actual NAT.  It is technically possible to setup and use NAT on the machine with the single network interface, but you’d probably be frowned upon by other system and network administrators.  Furthermore, it doesn’t seem to be possible on the Amazon AWS infrastructure.  I’m not 100% sure about that, but I’ve spent more time than I had to figure this out and I failed miserably.

The rest of the steps would greatly benefit from a bunch of screenshots and step-by-step click through guides, which I am too lazy to do.  You can use this manual, as a base, even though it covers a slightly different, more advanced setup.  Also, you might want to have a look at CentOS 7 instructions for NAT configuration, and the discussion on the differences between SNAT and MASQUERADE.

We’ll need a second network interface.  You can create a new Network Interface with the IP in your Private subnet and attach it to the NAT instance.  Here comes a word of caution:  there is a limit on how many network interfaces can be attached to EC2 instance.  This limit is based on the type of the instance.   So, if you want to use a t2.nano or t2.micro instance, for example, you’d be limited to only two interfaces.  That’s why I’ve used the example with two networks – to have a third interface added, you’d need a much bigger instance, like t2.medium. (Which is a total overkill for my purposes.)

Now that you’ve attached the second interface to your EC2 instance, we have a few things to do.  First, you need to disable “Source/Destination Check” on the second network interface.  You can do it in your AWS Console, or maybe even through the API (I haven’t gone that deep yet).

It is time to adjust the configuration of our EC2 instance.  I’ll assume CentOS 7 Linux distribution, but it’d be very easy to adjust to whatever other Linux you are running.

Firstly, we need to configure the second network interface.  The easiest way to do this is to copy /etc/sysconfig/network-scripts/ifcfg-eth0 file into /etc/sysconfig/network-scripts/ifcfg-eth1, and then edit the eth1 one file changing the DEVICE variable to “eth1“.  Before you restart your network service, also edit /etc/sysconfig/network file and add the following: GATEWAYDEV=eth0 .  This will tell the operating system to use the first network interface (eth0) as the gateway device.  Otherwise, it’ll be sending things into the Private network and things won’t work as you expect them.  Now, restart the network service and make sure that both network interfaces are there, with correct IPs and that your routes are fine.

Secondly, we need to tweak the kernel for the NAT job (sounds funny, doesn’t it?).  Edit your /etc/sysctl.conf file and make sure it has the following lines in it:

# Enable IP forwarding
net.ipv4.ip_forward=1
# Disable ICMP redirects
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.all.send_redirects=0
net.ipv4.conf.eth0.accept_redirects=0
net.ipv4.conf.eth0.send_redirects=0
net.ipv4.conf.eth1.accept_redirects=0
net.ipv4.conf.eth1.send_redirects=0

Apply the changes with sysctl -p.

Thirdly, and lastly, configure iptables to perform the network address translation.  Edit /etc/sysconfig/iptables and make sure you have the following:

*nat
:PREROUTING ACCEPT [48509:2829006]
:INPUT ACCEPT [33058:1879130]
:OUTPUT ACCEPT [57243:3567265]
:POSTROUTING ACCEPT [55162:3389500]
-A POSTROUTING -s 10.0.0.0/16 -o eth0 -j MASQUERADE
COMMIT

Adjust the IP range from 10.0.0.0/16 to your VPC range or the network that you want to NAT.  Restart the iptables service and check that everything is hunky-dory:

  1. The NAT instance can ping a host on the Internet (like 8.8.8.8).
  2. The NAT instance can ping a host on the Private network.
  3. The host on the Private network can ping the NAT instance.
  4. The host on the Private network can ping a host on the Internet (like 8.8.8.8).

If all that works fine, don’t forget to adjust your Network ACLs, Security Groups, and iptables to whatever level of paranoia appropriate for your environment.  If something is still not working, check all of the above again, especially for security layers, IP addresses (I spent a coupe of hours trying to find the problem, when it was the IP address typo – 10.0.0/16 – not the most obvious of things), network masks, etc.

Hope this helps.

Serverlessconf 2016 – New York City: a personal report

Serverlessconf 2016 – New York City: a personal report – is a fascinating read.  Let me get you hooked:

This event left me with the impression (or the confirmation) that there are two paces and speeds at which people are moving.

There is the so called “legacy” pace. This is often characterized by the notion of VMs and virtualization. This market is typically on-prem, owned by VMware and where the majority of workloads (as of today) are running. Very steady.

The second “industry block” is the “new stuff” and this is a truly moving target. #Serverless is yet another model that we are seeing emerging in the last few years. We have moved from Cloud (i.e. IaaS) to opinionated PaaS, to un-opinionated PaaS, to DIY Containers, to CaaS (Containers as a Service) to now #Serverless. There is no way this is going to be the end of it as it’s a frenetic moving target and in every iteration more and more people will be left behind.

This time around was all about the DevOps people being “industry dinosaurs”. So if you are a DevOps persona, know you are legacy already.

Sometimes I feel like I am leaving on a different planet.  All these people are so close, yet so far away …

How to Recover an Unreachable EC2 Linux Instance

volume

Here is a tutorial that will come handy one day, in the moment of panic – How to Recover an Unreachable Linux Instance. It has plenty of screenshots and shows each step in detail.

TL;DR version:

  1. Start a new instance (or pick one from the existing ones).
  2. Stop the broken instance.
  3. Detach the volume from the broken instance.
  4. Attach the volume to the new/existing instance as additional disk.
  5. Troubleshoot and fix the problem.
  6. Detach the volume from the new/existing instance.
  7. Attach the volume to the broken instance.
  8. Start the new instance.
  9. Get rid of the useless new instance, if you didn’t reuse the existing one for the troubleshooting and fixing process.
  10. ???
  11. PROFIT!

Packer – a tool for creating VM and container images

With the recent explosion in the virtualization and container technologies, one is often left disoriented.  Questions like “should I use virtual machines or containers?”, “which technology should I use”, and “can I migrate from one to another later?” are just some of those that will need answering.

Here is an open source tool that helps to avoid a few of those questions – Packer (by HashiCorp):

Packer is a tool for creating machine and container images for multiple platforms from a single source configuration.

Have a look at the supported platforms:

  • Amazon EC2 (AMI). Both EBS-backed and instance-store AMIs within EC2, optionally distributed to multiple regions.
  • DigitalOcean. Snapshots for DigitalOcean that can be used to start a pre-configured DigitalOcean instance of any size.
  • Docker. Snapshots for Docker that can be used to start a pre-configured Docker instance.
  • Google Compute Engine. Snapshots for Google Compute Engine that can be used to start a pre-configured Google Compute Engine instance.
  • OpenStack. Images for OpenStack that can be used to start pre-configured OpenStack servers.
  • Parallels (PVM). Exported virtual machines for Parallels, including virtual machine metadata such as RAM, CPUs, etc. These virtual machines are portable and can be started on any platform Parallels runs on.
  • QEMU. Images for KVM or Xen that can be used to start pre-configured KVM or Xen instances.
  • VirtualBox (OVF). Exported virtual machines for VirtualBox, including virtual machine metadata such as RAM, CPUs, etc. These virtual machines are portable and can be started on any platform VirtualBox runs on.
  • VMware (VMX). Exported virtual machines for VMware that can be run within any desktop products such as Fusion, Player, or Workstation, as well as server products such as vSphere.

The only question remaining now, it seems, is “why wouldn’t you use it?”. :)

Amazon Elastic File System

Here are some great news from the Amazon AWS blog – the announcement of the Elastic File System (EFS):

EFS lets you create POSIX-compliant file systems and attach them to one or more of your EC2 instances via NFS. The file system grows and shrinks as necessary (there’s no fixed upper limit and you can grow to petabyte scale) and you don’t pre-provision storage space or bandwidth. You pay only for the storage that you use.

EFS protects your data by storing copies of your files, directories, links, and metadata in multiple Availability Zones.

In order to provide the performance needed to support large file systems accessed by multiple clients simultaneously,Elastic File System performance scales with storage (I’ll say more about this later).

I think this might have been the most requested feature/service from Amazon AWS since EC2 launch.  Sure, one could have built an NFS file server before, but with the variety of storage options, availability zones, and the dynamic nature of the cloud setup itself, that was quite a challenge.  Now – all that and more in just a few clicks.

Thank you Amazon!

 

Support lesson to learn from Amazon AWS

I’ve said a million times how happy I am with Amazon AWS.  Today I also want to share a positive lesson to learn from their technical support.  It’s the second time I’ve contacted them over the last year and a half, and it’s the second time I am amazed at how good well it works.

In my experience, technical support departments usually rely on one primary communication channel – be that a telephone, an email, a ticketing system, or a live chat.  The other channels are often just routed or converted into the main one, or, even, completely ignored.  But each one of those has it’s benefits and side effects.

Telephone provides the most immediate connectivity, and a much valued option of the human interaction.  But the communication is verbal, often without the paper trail.  It makes it difficult to carbon copy (CC) people on the conversation or review exactly what has been said.  It is also very free form, unstructured.

Live chat is also free form and unstructured, but it’s written, so transcripts are easily available.  It also helps with the carbon copy, but only on the receiving end – supervisors or field experts can often be included in the conversation, but adding somebody from the requesting side is rarely supported.

Email makes it easy to carbon copy people on both ends.  It provides the paper trail, but often lacks the immediate response factor.  And it’s still unstructured, making it difficult to figure out what was requested, what has been discussed and whether or not there was any resolution.  (Have you ever been a part of a lengthy multi-lingual conversation about, what turned out to be, multiple issues in the same thread?)

Ticketing/support systems help to structure the conversation and make it follow a certain workflow.  But they often lack humanity and, much like emails, the immediate response.

Now, what Amazon AWS support has done is a beautiful combination of a ticketing system and a phone.  You start off with the ticketing system – login, create a new support case, providing all the necessary information, and optionally CC other people from a single short form.  The moment you submit it, the web page asks for your phone number.  Once entered, a phone call is placed immediately by the system, connecting you to the support engineer.  The engineer confirms a few case details and lets you know that the case is in progress and expected resolution time (I was asking to raise the limit of the Elastic IP addresses on the Virtual Private Cloud, and I was told it will be done in the next 15 to 30 minute.  And it was done in 10!).  I have also received two emails – one confirming the opening of the case, with all the requested details, and another one notifying me that the work has been done, providing quick information on how to follow up, in case I needed to.

Overall experience was very smooth, fast, to the point, and very effective.  I never got lost.  I never had to figure anything out.  And my problem was attended to and resolved immediately.

I only wish more companies provided this level of support.  I’ll sure try too – but it’s a bar set high.

 

 

Top level domain nonsense and how it can break your stuff

Call me old school, but I really (I mean REALLY) don’t like the recent explosion of the top level domains.  I understand that most good names are taken in .com, .org, and .net zones, but do we really need all those .blue, .parts, and .yoga TLDs?

Why am I whining about all this all of a sudden?  I’ll tell you why.  Because a new top level domain – .aws – is about to be introduced, and it already broke something for me in a non-obvious manner.

aws

I manage a few Virtual Private Clouds on the Amazon AWS.  Many of these use and rely on some hostname naming convention (yeah, I’m familiar with the pets vs. cattle idea).  Imagine you have a few servers, which are separated into generic infrastructure and client segments, like so:

  • bastion.aws.example.com
  • firewall.aws.example.com
  • lb.aws.example.com
  • web.client1.example.com
  • db.client1.example.com
  • web.client2.example.com
  • db.client2.example.com
  • … and so on.

Working with such long FQDNs (fully qualified domain names) isn’t very convenient.  So add “search example.com” to your /etc/resolve.conf file and now you can use short hostnames like firewall.aws and web.client1.  And life is beautiful …

… until one day, when you see the following:

user@bastion.aws$> ssh firewall.aws
Permission denied (publickey).

And that’s when your heart misses a beat, the world freezes, and you go: “WTF?”.  All kinds of thoughts are rushing through your head.  Is it a typo?  Am I in the right place? Did the server get compromised?  How’s that for a little panic …

Trying a few things here and there, you manage to get into the server from somewhere else.  You are very careful.  You are looking around for any traces of the break-in, but you see nothing.  You dig through the logs both on the server and off it.  Still nothing.  You can dive into all those logwatch and cron messages in your Trash, that you were automatically deleting, cause things were working fine for so long.  There!  You find that cron was complaining that backup script couldn’t get into this machine.  Uh-oh.  This was happening for a few days now.  A black cloud of combined worry for the compromised machine and outdated backup kills the sunlight in your life.  Dammit!

Take a break to calm down.  Try to think clearly.  Don’t panic.  Stop assuming things, and start troubleshooting.

A few minutes later, you establish that the problem is not limited to that particular machine.  All your .aws hosts share this headache.  A few more minutes later, you learn that ‘ssh firewall.aws.example.com’ works fine, while ‘ssh firewall.aws’ still doesn’t.

That points toward the hostname resolution issue.   With that, it takes only a few more moments to see the following:

user@bastion.aws$> host firewall.aws
firewall.aws has address 127.0.53.53
firewall.aws mail is handled by 10 your-dns-needs-immediate-attention.aws.

Say what?  That’s not at all what I expected.  And what is that that I need to fix with my DNS?  Google search brings this beauty:

This is problably because the .dev and .local are now valid top level extensions.

Really? Who’s the genius behind that?  I thought people chose those specifically to make them internal.  So is there an .aws top level extension now too?  You bet there is!

Solution?  Well, as far as I am concerned, from this day onward, I don’t trust the brief hostnames anymore.  It’s FQDN or nothing.

AWS Database Migration Service

AWS Database Migration Service

AWS Database Migration Service is yet another one of those tools that you always wished that somebody created, but never actually got to checking if it exists.  Here is a recent blog post showcasing the functionality.

Do you currently store relational data in an on-premises Oracle, SQL Server, MySQL, MariaDB, or PostgreSQL database? Would you like to move it to the AWS cloud with virtually no downtime so that you can take advantage of the scale, operational efficiency, and the multitude of data storage options that are available to you?

If so, the new AWS Database Migration Service (DMS) is for you! First announced last fall at AWS re:Invent, our customers have already used it to migrate over 1,000 on-premises databases to AWS. You can move live, terabyte-scale databases to the cloud, with options to stick with your existing database platform or to upgrade to a new one that better matches your requirements.  If you are migrating to a new database platform as part of your move to the cloud, the AWS Schema Conversion Tool will convert your schemas and stored procedures for use on the new platform.