Call me old school, but I really (I mean REALLY) don’t like the recent explosion of the top level domains. I understand that most good names are taken in .com, .org, and .net zones, but do we really need all those .blue, .parts, and .yoga TLDs?
Why am I whining about all this all of a sudden? I’ll tell you why. Because a new top level domain – .aws – is about to be introduced, and it already broke something for me in a non-obvious manner.
I manage a few Virtual Private Clouds on the Amazon AWS. Many of these use and rely on some hostname naming convention (yeah, I’m familiar with the pets vs. cattle idea). Imagine you have a few servers, which are separated into generic infrastructure and client segments, like so:
- bastion.aws.example.com
- firewall.aws.example.com
- lb.aws.example.com
- web.client1.example.com
- db.client1.example.com
- web.client2.example.com
- db.client2.example.com
- … and so on.
Working with such long FQDNs (fully qualified domain names) isn’t very convenient. So add “search example.com” to your /etc/resolve.conf file and now you can use short hostnames like firewall.aws and web.client1. And life is beautiful …
… until one day, when you see the following:
user@bastion.aws$> ssh firewall.aws
Permission denied (publickey).
And that’s when your heart misses a beat, the world freezes, and you go: “WTF?”. All kinds of thoughts are rushing through your head. Is it a typo? Am I in the right place? Did the server get compromised? How’s that for a little panic …
Trying a few things here and there, you manage to get into the server from somewhere else. You are very careful. You are looking around for any traces of the break-in, but you see nothing. You dig through the logs both on the server and off it. Still nothing. You can dive into all those logwatch and cron messages in your Trash, that you were automatically deleting, cause things were working fine for so long. There! You find that cron was complaining that backup script couldn’t get into this machine. Uh-oh. This was happening for a few days now. A black cloud of combined worry for the compromised machine and outdated backup kills the sunlight in your life. Dammit!
Take a break to calm down. Try to think clearly. Don’t panic. Stop assuming things, and start troubleshooting.
A few minutes later, you establish that the problem is not limited to that particular machine. All your .aws hosts share this headache. A few more minutes later, you learn that ‘ssh firewall.aws.example.com’ works fine, while ‘ssh firewall.aws’ still doesn’t.
That points toward the hostname resolution issue. With that, it takes only a few more moments to see the following:
user@bastion.aws$> host firewall.aws
firewall.aws has address 127.0.53.53
firewall.aws mail is handled by 10 your-dns-needs-immediate-attention.aws.
Say what? That’s not at all what I expected. And what is that that I need to fix with my DNS? Google search brings this beauty:
This is problably because the .dev and .local are now valid top level extensions.
Really? Who’s the genius behind that? I thought people chose those specifically to make them internal. So is there an .aws top level extension now too? You bet there is!
Solution? Well, as far as I am concerned, from this day onward, I don’t trust the brief hostnames anymore. It’s FQDN or nothing.