Ansible safety net for DNS wildcard hosts

After using Ansible for only a week, I am deeply in love.  I am doing more and more with less and less, and that’s exactly how I want my automation.

Today I had to solve an interesting problem.  Ansible operates, based on the host and group inventory.  As I mentioned before, I am now always relying on FQDNs (fully qualified domain names) for my host names.  But what happens when DNS wildcards come into play with things like load balancers and reverse proxies  Consider an example:

  1. Nginx configured as reverse proxy on the machine proxy1.example.com with 10.0.0.10 IP address.
  2. DNS wildcard is in place: *.example.com 3600 IN CNAME proxy1.example.com.
  3. Ansible contains proxy1.example.com in host inventory and a playbook to setup the reverse proxy with Nginx.
  4. Ansible contains a few other hosts in inventory and a playbook to setup Nginx as a web server.
  5. Somebody adds a new host to inventory: another-web-server.example.com, without specifying any other host details, like ansible_ssh_host variable.  And he also forgets to update the DNS zone with a new A or CNAME record.

Now, Ansible play is executed for the web servers configuration.  All previously existing machines are fine.  But the new machine’s another-web-server.example.com host name resolves to proxy1.example.com, which is where Ansible connects and runs the Nginx setup, overwriting the existing configuration, triggering a service restart, and screwing up your life.  Just kidding, of course. :)  It’ll be trivial to find out what happened.  Fixing the Nginx isn’t too difficult either.  Especially if you have backups in place.  But it’s still better to avoid the whole mess altogether.

To help prevent these cases, I decided to create a new safety net role.  Given a variable like:

---
# Aliased IPs is a list of hosts, which can be reached in 
# multiple ways due to DNS wildcards. Both IPv4 and IPv6 
# can be used. The hostname value is the primary hostname 
# for the IP - any other inventory hostname having any of 
# these IPs will cause a failure in the play.
aliased_ips:
  "10.0.0.10": "proxy1.example.com"
  "192.168.0.10": "proxy1.example.com"

And the following code in the role’s tasks/main.yml:

---
- debug: msg="Safety net - before IPv4"

- name: Check all IPv4 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
  with_nested:
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv4_addresses }}"

- debug: msg="Safety net - after IPv4 and before IPv6"

- name: Check all IPv6 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
  with_nested:
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv6_addresses }}"

- debug: msg="Safety net - after IPv6"

the safety net is in place.  The first check will connect to the remote server, get the list of all configured IPv4 addresses, and then compare each one with each IP address in the aliased_ips variable.  For every matching pair, it will check if the remote server’s host name from the inventory file matches the host name from the aliased_ips value for the matched IP address.  If the host names match, it’ll continue.  If not – a failure in the play occurs (Ansible speak for thrown exception).  Other tasks will continue execution for other hosts, but nothing else will be done during this play run for this particular host.

The second check will do the same but with IPv6 addresses.  You can mix and match both IPv4 and IPv6 in the same aliased_ips variable.  And Ansible is smart enough to exclude the localhost IPs too, so things shouldn’t break too much.

I’ve tested the above and it seems to work well for me.

There is a tiny issue with elegance here though: host name to IP mappings are already configured in the DNS zone – duplicating this configuration in the aliased_ips variable seems annoying.  Personally, I don’t have that many reverse proxies and load balancers to handle, and they don’t change too often either, so I don’t mind.  Also, there is something about relying on DNS while trying to protect against DNS mis-configuration that rubs me the wrong way.  But if you are the adventurous type, have a look at the Ansible’s dig lookup, which you can use to fetch the IP addresses from the DNS server of your choice.

As always, if you see any potential issues with the above or know of a better way to solve it, please let me know.

Top level domain nonsense and how it can break your stuff

Call me old school, but I really (I mean REALLY) don’t like the recent explosion of the top level domains.  I understand that most good names are taken in .com, .org, and .net zones, but do we really need all those .blue, .parts, and .yoga TLDs?

Why am I whining about all this all of a sudden?  I’ll tell you why.  Because a new top level domain – .aws – is about to be introduced, and it already broke something for me in a non-obvious manner.

aws

I manage a few Virtual Private Clouds on the Amazon AWS.  Many of these use and rely on some hostname naming convention (yeah, I’m familiar with the pets vs. cattle idea).  Imagine you have a few servers, which are separated into generic infrastructure and client segments, like so:

  • bastion.aws.example.com
  • firewall.aws.example.com
  • lb.aws.example.com
  • web.client1.example.com
  • db.client1.example.com
  • web.client2.example.com
  • db.client2.example.com
  • … and so on.

Working with such long FQDNs (fully qualified domain names) isn’t very convenient.  So add “search example.com” to your /etc/resolve.conf file and now you can use short hostnames like firewall.aws and web.client1.  And life is beautiful …

… until one day, when you see the following:

user@bastion.aws$> ssh firewall.aws
Permission denied (publickey).

And that’s when your heart misses a beat, the world freezes, and you go: “WTF?”.  All kinds of thoughts are rushing through your head.  Is it a typo?  Am I in the right place? Did the server get compromised?  How’s that for a little panic …

Trying a few things here and there, you manage to get into the server from somewhere else.  You are very careful.  You are looking around for any traces of the break-in, but you see nothing.  You dig through the logs both on the server and off it.  Still nothing.  You can dive into all those logwatch and cron messages in your Trash, that you were automatically deleting, cause things were working fine for so long.  There!  You find that cron was complaining that backup script couldn’t get into this machine.  Uh-oh.  This was happening for a few days now.  A black cloud of combined worry for the compromised machine and outdated backup kills the sunlight in your life.  Dammit!

Take a break to calm down.  Try to think clearly.  Don’t panic.  Stop assuming things, and start troubleshooting.

A few minutes later, you establish that the problem is not limited to that particular machine.  All your .aws hosts share this headache.  A few more minutes later, you learn that ‘ssh firewall.aws.example.com’ works fine, while ‘ssh firewall.aws’ still doesn’t.

That points toward the hostname resolution issue.   With that, it takes only a few more moments to see the following:

user@bastion.aws$> host firewall.aws
firewall.aws has address 127.0.53.53
firewall.aws mail is handled by 10 your-dns-needs-immediate-attention.aws.

Say what?  That’s not at all what I expected.  And what is that that I need to fix with my DNS?  Google search brings this beauty:

This is problably because the .dev and .local are now valid top level extensions.

Really? Who’s the genius behind that?  I thought people chose those specifically to make them internal.  So is there an .aws top level extension now too?  You bet there is!

Solution?  Well, as far as I am concerned, from this day onward, I don’t trust the brief hostnames anymore.  It’s FQDN or nothing.

namechk – check domain and social networks name availability

namechk is a handy tool for those who’s looking for new domains and social network profile names.  In one go you can see an overview of what’s available and what’s not.

namechk

Google Domains Registrar

google domains

I’ve been quite busy lately, so somehow I missed thisGoogle Domains service allows one to register a new domain or transfer an existing one to Google, and have it integrated with Gmail and other Google apps.  Currently it only works for people with the US billing address, so I couldn’t use it, but it feels like the drop of beta is not too far away.

These are extremely good news for everyone, except, probably, GoDaddy.

The Rise and Fall of .Ly

Jon Postel

The Rise and Fall of .Ly” covers some of the not so widely known Internet history, including The God of the Internet, Jon Postel:

Until 1998, the Internet had a “God.” His name was Jon Postel.

Postel was a computer science student at UCLA in the late 1960s. In 1969, he got into the Internet more or less on the ground floor, when he was part of the team that set up the first node of the ARPANET — which would lay the technological groundwork for the modern Internet.

In these early days, computers would refer to each other and the files on them by IP address. The earliest web addresses were strings of numbers, like: 123.45.67.89. If you wanted to reference, access, or communicate with a computer, you’d type in its numerical address. As the ARPANET grew, its moderators compiled a single file mapping memorable names, often pronounceable strings of characters, to IP addresses. This file was named “HOSTS.TXT”, and it was like a giant phone book with every computer’s name and number in it. Hosts made copies of the master HOSTS.TXT. This system got more and more cumbersome as the network got bigger and bigger.

In 1983, ARPANET became a subnet of the early Internet. At around the same time, Postel, along with computer scientist Paul Mockapetris, devised a new system to name the various places of the web. Their invention, called the Domain Name System (DNS), took the role of the HOSTS.TXT file and distributed it across an eventually vast, multifaceted network of servers.