Troubleshooting with /dev/tcp and /dev/udp

Imagine you are on a freshly installed Linux machine with the minimal set of packages, and you need to test network connectivity.  You don’t have netcat, telnet, and your other usual tools.  For the sake of the example, imagine that even curl and wget are missing.  What do you do?

Well, apparently, there is a way to do this with plain old bash.  A way, which I didn’t know until today.  You can do this with /dev/tcp and /dev/udp. Here is an example verbatim from the Advanced Bash-Scripting Guide:

#!/bin/bash
# dev-tcp.sh: /dev/tcp redirection to check Internet connection.

# Script by Troy Engel.
# Used with permission.
 
TCP_HOST=news-15.net       # A known spam-friendly ISP.
TCP_PORT=80                # Port 80 is http.
  
# Try to connect. (Somewhat similar to a 'ping' . . .) 
echo "HEAD / HTTP/1.0" >/dev/tcp/${TCP_HOST}/${TCP_PORT}
MYEXIT=$?

: <<EXPLANATION If bash was compiled with --enable-net-redirections, it has the capability of using a special character device for both TCP and UDP redirections. These redirections are used identically as STDIN/STDOUT/STDERR. The device entries are 30,36 for /dev/tcp: mknod /dev/tcp c 30 36 >From the bash reference:
/dev/tcp/host/port
    If host is a valid hostname or Internet address, and port is an integer
port number or service name, Bash attempts to open a TCP connection to the
corresponding socket.
EXPLANATION

   
if [ "X$MYEXIT" = "X0" ]; then
  echo "Connection successful. Exit code: $MYEXIT"
else
  echo "Connection unsuccessful. Exit code: $MYEXIT"
fi

exit $MYEXIT

 

Steven Black hosts files

StevenBlack/hosts repository:

Extending and consolidating hosts files from a variety of sources like adaway.org, mvps.org, malwaredomains.com, someonewhocares.org, yoyo.org, and potentially others. You can optionally invoke extensions to block additional sites by category.

Categories include: adware, malware, gambling, porn, and social networks.

How the Internet works

cable

Ars Technica runs a nice overview article “How the Internet works: Submarine fiber, brains in jars, and coaxial cables“.  It features plenty of cool images, statistics, and details of the Internet wiring from under the sea to the last mile to the last 100 meters.  It’s mostly focused on UK, but it provides a good understanding of what’s involved in the modern day connectivity.

P.S.: On a less serious note, here’s The IT Crowd take on how the Internet works.  Thanks to Maxym Balabaev for a reminder.

Setting up NAT on Amazon AWS

When it comes to Amazon AWS, there are a few options for configuring Network Address Translation (NAT).  Here is a brief overview.

NAT Gateway

NAT Gateway is a configuration very similar to Internet Gateway.  My understanding is that the only major difference between the NAT Gateway and the Internet Gateway is that you have the control over the external public IP address of the NAT Gateway.  That’ll be one of your allocated Elastic IPs (EIPs).  This option is the simplest out of the three that I considered.  If you need plain and simple NAT – than that’s a good one to go for.

NAT Instance

NAT Instance is a special purpose EC2 instance, which is configured to do NAT out of the box.  If you need anything on top of plain NAT (like load balancing, or detailed traffic monitoring, or firewalls), but don’t have enough confidence in your network and system administration skills, this is a good option to choose.

Custom Setup

If you are the Do It Yourself guy, this option is for you.   But it can get tricky.  Here are a few things that I went through, learnt and suffered through, so that you don’t have to (or future me, for that matter).

Let’s start from the beginning.  You’ve created your own Virtual Private Cloud (VPC).  In that cloud, you’ve created two subnets – Public and Private (I’ll use this for example, and will come back to what happens with more).  Both of these subnets use the same routing table with the Internet Gateway.  Now you’ve launched an EC2 instance into your Public subnet and assigned it a private IP address.  This will be your NAT instance.  You’ve also launched another instance into the Private subnet, which will be your test client.  So far so good.

This instance will be used for translating internal IP addresses from the Private subnet to the external public IP address.  So, we, obviously, need an external IP address.  Let’s allocate an Elastic IP and associate it with the EC2 instance.  Easy peasy.

Now, we’ll need to create another routing table, using our NAT instance as the default gateway.  Once created, this routing table should be associated with our Private subnet.  This will cause all the machines on that network to use the NAT instance for any external communications.

Let’s do a quick side track here – security.  There are three levels that you should keep in mind here:

  • Network ACLs.  These are Amazon AWS access control lists, which control the traffic allowed in and out of the networks (such as our Public and Private subnets).  If the Network ACL prevents certain traffic, you won’t be able to reach the host, irrelevant of the host security configuration.  So, for the sake of the example, let’s allow all traffic in and out of both the Public and Private networks.  You can adjust it once your NAT is working.
  • Security Groups.  These are Amazon AWS permissions which control what type of traffic is allowed in or out of the network interface.  This is slightly confusing for hosts with the single interface, but super useful for machines with multiple network interfaces, especially if those interfaces are transferred between instances.  Create a single Security Group (for now, you can adjust this later), which will allow any traffic in from your VPC range of IPs, and any outgoing traffic.  Assign this Security Group to both EC2 instances.
  • Host firewall.  Chances are, you are using a modern Linux distribution for your NAT host.  This means that there is probably an iptables service running with some default configuration, which might prevent certain access.  I’m not going to suggest to disable it, especially on the machine facing the public Internet.  But just keep it in mind, and at the very least allow the ICMP protocol, if not from everywhere, then at least from your VPC IP range.

Now, on to the actual NAT.  It is technically possible to setup and use NAT on the machine with the single network interface, but you’d probably be frowned upon by other system and network administrators.  Furthermore, it doesn’t seem to be possible on the Amazon AWS infrastructure.  I’m not 100% sure about that, but I’ve spent more time than I had to figure this out and I failed miserably.

The rest of the steps would greatly benefit from a bunch of screenshots and step-by-step click through guides, which I am too lazy to do.  You can use this manual, as a base, even though it covers a slightly different, more advanced setup.  Also, you might want to have a look at CentOS 7 instructions for NAT configuration, and the discussion on the differences between SNAT and MASQUERADE.

We’ll need a second network interface.  You can create a new Network Interface with the IP in your Private subnet and attach it to the NAT instance.  Here comes a word of caution:  there is a limit on how many network interfaces can be attached to EC2 instance.  This limit is based on the type of the instance.   So, if you want to use a t2.nano or t2.micro instance, for example, you’d be limited to only two interfaces.  That’s why I’ve used the example with two networks – to have a third interface added, you’d need a much bigger instance, like t2.medium. (Which is a total overkill for my purposes.)

Now that you’ve attached the second interface to your EC2 instance, we have a few things to do.  First, you need to disable “Source/Destination Check” on the second network interface.  You can do it in your AWS Console, or maybe even through the API (I haven’t gone that deep yet).

It is time to adjust the configuration of our EC2 instance.  I’ll assume CentOS 7 Linux distribution, but it’d be very easy to adjust to whatever other Linux you are running.

Firstly, we need to configure the second network interface.  The easiest way to do this is to copy /etc/sysconfig/network-scripts/ifcfg-eth0 file into /etc/sysconfig/network-scripts/ifcfg-eth1, and then edit the eth1 one file changing the DEVICE variable to “eth1“.  Before you restart your network service, also edit /etc/sysconfig/network file and add the following: GATEWAYDEV=eth0 .  This will tell the operating system to use the first network interface (eth0) as the gateway device.  Otherwise, it’ll be sending things into the Private network and things won’t work as you expect them.  Now, restart the network service and make sure that both network interfaces are there, with correct IPs and that your routes are fine.

Secondly, we need to tweak the kernel for the NAT job (sounds funny, doesn’t it?).  Edit your /etc/sysctl.conf file and make sure it has the following lines in it:

# Enable IP forwarding
net.ipv4.ip_forward=1
# Disable ICMP redirects
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.all.send_redirects=0
net.ipv4.conf.eth0.accept_redirects=0
net.ipv4.conf.eth0.send_redirects=0
net.ipv4.conf.eth1.accept_redirects=0
net.ipv4.conf.eth1.send_redirects=0

Apply the changes with sysctl -p.

Thirdly, and lastly, configure iptables to perform the network address translation.  Edit /etc/sysconfig/iptables and make sure you have the following:

*nat
:PREROUTING ACCEPT [48509:2829006]
:INPUT ACCEPT [33058:1879130]
:OUTPUT ACCEPT [57243:3567265]
:POSTROUTING ACCEPT [55162:3389500]
-A POSTROUTING -s 10.0.0.0/16 -o eth0 -j MASQUERADE
COMMIT

Adjust the IP range from 10.0.0.0/16 to your VPC range or the network that you want to NAT.  Restart the iptables service and check that everything is hunky-dory:

  1. The NAT instance can ping a host on the Internet (like 8.8.8.8).
  2. The NAT instance can ping a host on the Private network.
  3. The host on the Private network can ping the NAT instance.
  4. The host on the Private network can ping a host on the Internet (like 8.8.8.8).

If all that works fine, don’t forget to adjust your Network ACLs, Security Groups, and iptables to whatever level of paranoia appropriate for your environment.  If something is still not working, check all of the above again, especially for security layers, IP addresses (I spent a coupe of hours trying to find the problem, when it was the IP address typo – 10.0.0/16 – not the most obvious of things), network masks, etc.

Hope this helps.

Global Internet Map 2012

internet map

I came across “Global Internet Map 2012” – an interactive map by TeleGeography, via this article (in Russian).  If you read the language, check the article for more maps and resources on the subject.  Also check my previous posts here and here.

Apart from the absolute visual awesomeness, one thing that struck me in particular is how weird the world looks if you just rotate the map a bit.

Ansible safety net for DNS wildcard hosts

After using Ansible for only a week, I am deeply in love.  I am doing more and more with less and less, and that’s exactly how I want my automation.

Today I had to solve an interesting problem.  Ansible operates, based on the host and group inventory.  As I mentioned before, I am now always relying on FQDNs (fully qualified domain names) for my host names.  But what happens when DNS wildcards come into play with things like load balancers and reverse proxies  Consider an example:

  1. Nginx configured as reverse proxy on the machine proxy1.example.com with 10.0.0.10 IP address.
  2. DNS wildcard is in place: *.example.com 3600 IN CNAME proxy1.example.com.
  3. Ansible contains proxy1.example.com in host inventory and a playbook to setup the reverse proxy with Nginx.
  4. Ansible contains a few other hosts in inventory and a playbook to setup Nginx as a web server.
  5. Somebody adds a new host to inventory: another-web-server.example.com, without specifying any other host details, like ansible_ssh_host variable.  And he also forgets to update the DNS zone with a new A or CNAME record.

Now, Ansible play is executed for the web servers configuration.  All previously existing machines are fine.  But the new machine’s another-web-server.example.com host name resolves to proxy1.example.com, which is where Ansible connects and runs the Nginx setup, overwriting the existing configuration, triggering a service restart, and screwing up your life.  Just kidding, of course. :)  It’ll be trivial to find out what happened.  Fixing the Nginx isn’t too difficult either.  Especially if you have backups in place.  But it’s still better to avoid the whole mess altogether.

To help prevent these cases, I decided to create a new safety net role.  Given a variable like:

---
# Aliased IPs is a list of hosts, which can be reached in 
# multiple ways due to DNS wildcards. Both IPv4 and IPv6 
# can be used. The hostname value is the primary hostname 
# for the IP - any other inventory hostname having any of 
# these IPs will cause a failure in the play.
aliased_ips:
  "10.0.0.10": "proxy1.example.com"
  "192.168.0.10": "proxy1.example.com"

And the following code in the role’s tasks/main.yml:

---
- debug: msg="Safety net - before IPv4"

- name: Check all IPv4 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
  with_nested:
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv4_addresses }}"

- debug: msg="Safety net - after IPv4 and before IPv6"

- name: Check all IPv6 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
  with_nested:
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv6_addresses }}"

- debug: msg="Safety net - after IPv6"

the safety net is in place.  The first check will connect to the remote server, get the list of all configured IPv4 addresses, and then compare each one with each IP address in the aliased_ips variable.  For every matching pair, it will check if the remote server’s host name from the inventory file matches the host name from the aliased_ips value for the matched IP address.  If the host names match, it’ll continue.  If not – a failure in the play occurs (Ansible speak for thrown exception).  Other tasks will continue execution for other hosts, but nothing else will be done during this play run for this particular host.

The second check will do the same but with IPv6 addresses.  You can mix and match both IPv4 and IPv6 in the same aliased_ips variable.  And Ansible is smart enough to exclude the localhost IPs too, so things shouldn’t break too much.

I’ve tested the above and it seems to work well for me.

There is a tiny issue with elegance here though: host name to IP mappings are already configured in the DNS zone – duplicating this configuration in the aliased_ips variable seems annoying.  Personally, I don’t have that many reverse proxies and load balancers to handle, and they don’t change too often either, so I don’t mind.  Also, there is something about relying on DNS while trying to protect against DNS mis-configuration that rubs me the wrong way.  But if you are the adventurous type, have a look at the Ansible’s dig lookup, which you can use to fetch the IP addresses from the DNS server of your choice.

As always, if you see any potential issues with the above or know of a better way to solve it, please let me know.

WTF with Amazon and TCP

Here goes the story of me learning a few new swear words and pulling out nearly all my hair.  Grab a cup of coffee, this will take make a while to tell…

First of all, here is a diagram to make things a little bit more visual.

wtf

As you can see, we have an office network with NAT on the gateway.  We have an Amazon VPC with NAT on the bastion host.  And then there’s the rest of the Internet.

The setup is pretty straight forward.  There are no outgoing firewalls anywhere, no VLANs, no network equipment – all of the involved machines are a variety of Linux boxes.  The whole thing has been working fine for a while now.

A couple of weeks ago we had an issue with our ISP in the office.  The Internet connection was alive, but we were getting extremely high packet loss – around 80%.  The technician passed by, changed the cables, rebooted the ADSL modem, and we’ve also rebooted the gateway.  The problem was fixed, except for one annoying bit.  We could access all of the Internet just fine, except our Amazon VPC bastion host.  Here’s where it gets interesting.

Continue reading “WTF with Amazon and TCP”

IPv6 20th birthday with 10% global penetration

Here’s some not so light coffee time reading on IPv6 – IPv6 non-alternatives: DJB’s article, 13 years later – an article that links, among other things to this Ars Technica article, which features some IPv6 statistics.  Summary?  Sure.  IPv6 RFC celebrates 20 year birthday this month with 10% global penetration.

ipv6

Exponential growth year-on-year is good.  But the absolute numbers aren’t so bright yet.  Especially considering some of the areas where it wasn’t so successful.