SSH multiplexing and Ansible via bastion host

It never ceases to amaze me how even after years and years of working with some technologies I keep finding out about super useful features in those technologies, that could have saved me lots of time if I knew about them earlier.  Today was a day just like that.

I was working on the Ansible setup for a new hosting environment.  One particular thing I wanted to utilize more was a bastion host – a single Linux machine with exposed secure shell (SSH) port, which will be used for managing the configurations of all the servers within the environment.  I sort of done that before, but the solution wasn’t as elegant as I wanted it to be.

So, I came across this article – Running Ansible Through an SSH Bastion Host.  Which, among other things taught me about a feature that I didn’t know nothing about.  Literally.  Haven’t even heard about it.  Multiplexing in OpenSSH:

Multiplexing is the ability to send more than one signal over a single line or connection. With multiplexing, OpenSSH can re-use an existing TCP connection for multiple concurrent SSH sessions rather than creating a new one each time.

This doesn’t sound too useful for when you are working in command line, one server at a time.  Who cares how many TCP connections do you need? It’ll be one, or two, or five.  Ten, if you are really involved.  But by that time you’ll probably be running background processes, and screen or tmux (which are apparently called “terminal multiplexers“).

It’s when you are going deeper into automation, such as in my case with Ansible, when you’ll need OpenSSH multiplexing.  Ansible, being a configuration manager, can run a whole lot of commands one after another.  It can run them on multiple servers in parallel as well.  That’s where reusing the connections can make quite a bit of a difference.  If every command you run connects to the remote server, executes, and then disconnects, you can benefit from not needing to connect and disconnect multiple times (tens or hundreds of times, every playbook run).   Reusing connection for parallel jobs is even better – and that’s a case with bastion host, for example.

Here are a few useful links from that article, just in case the ether eats it one day:

Armed with those, I had my setup running in no time.  The only minor correction I had to do for my case was the SSH configuration for the bastion host.  The example in the article is NOT wrong:

Host 10.10.10.*
  ProxyCommand ssh -W %h:%p bastion.example.com
  IdentityFile ~/.ssh/private_key.pem

Host bastion.example.com
  Hostname bastion.example.com
  User ubuntu
  IdentityFile ~/.ssh/private_key.pem
  ForwardAgent yes
  ControlMaster auto
  ControlPath ~/.ssh/ansible-%r@%h:%p
  ControlPersist 5m

It’s just that in my case, I use hostnames both for the bastion host and the hosts which are managed through it.  So I had to adjust it as so:

Host *.example.com !bastion.example.com
  ProxyCommand ssh -W %h:%p bastion.example.com
  IdentityFile ~/.ssh/private_key.pem

Host bastion.example.com
  Hostname bastion.example.com
  User ubuntu
  IdentityFile ~/.ssh/private_key.pem
  ForwardAgent yes
  ControlMaster auto
  ControlPath ~/.ssh/ansible-%r@%h:%p
  ControlPersist 5m

Notice the two changes:

  1. Switch of the first block from IP addresses to host names, with a mask.
  2. Negation of the bastion host configuration.

The reason for the second change is that if there are multiple Host matches in the configuration file, OpenSSH will combine all options from the matched configurations (something I didn’t find in the ssh_config manual).  Try this example ssh.conf with some real hosts of yours:

Host bastion.example.com
	User someuser

Host *.example.com
	Port 2222

You’ll see the output similar to this:

$ ssh -F ssh.conf bastion.example.com -v
OpenSSH_7.2p2, OpenSSL 1.0.2h-fips  3 May 2016
debug1: Reading configuration data ssh.conf
debug1: ssh.conf line 1: Applying options for bastion.example.com
debug1: ssh.conf line 4: Applying options for *.example.com
debug1: Connecting to bastion.example.com [1.2.3.4] port 2222.
^C

Once you negate the bastion host from the wildcard configuration, everything works as expected.

You might also try using “%r@%h:%p” for the socket to be different for each remote username that you will concurrently connect with, but that’s just nit-picking.

Forcing Amazon Linux AMI compatibility with CentOS in Ansible

One of the things that makes Ansible so awesome is a huge collection of shared roles over at Ansible Galaxy.  These bring you best practices, flexible configurations and in general save hours and hours of hardcore swearing and hair pulling.

Each role usually supports multiple versions of multiple Linux distributions.  However, you’ll find that the majority of the supported distributions are Ubuntu, Debian, Red Hat Enterprise Linux, CentOS, and Fedora.  The rest aren’t as popular.

Which brings me to the point with Amazon Linux AMI.  Amazon Linux AMI is mostly compatible with CentOS, but it uses a different version approach, which means that most of those Ansible roles will ignore or complain about not supporting Amazon AMI.

Here is an example I came across yesterday from the dj-wasabi.zabbix-server role.  The template for the Yum repository uses ansible_os_major_version variable, which is expected to be similar to Red Hat / CentOS version number – 5, 6, 7, etc.  Amazon Linux AMI’s major version is reported as “NA” – not available.   That’s probably because Amazon Linux AMI versions are date-based – with the latest one being 2016.03.

[zabbix]
name=Zabbix Official Repository - $basearch
baseurl=http://repo.zabbix.com/zabbix/{{ zabbix_version }}/rhel/{{ ansible_distribution_major_version }}/$basearch/
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX

Officially, Amazon Linux AMI is not CentOS or Red Hat Enterprise Linux.  But if you don’t care about such little nuances, and you are brave enough to experiment and assume things, than you can make that role work, by simply setting the appropriate variables to the values that you want.

First, here is a standalone test.yml playbook to try things out:

- name: Test
hosts: localhost
pre_tasks:
- set_fact: ansible_distribution_major_version=6
when: ansible_distribution == "Amazon"
tasks:
- debug: msg={{ ansible_distribution_major_version }}

Let’s run it and look at the output:

$ ansible-playbook test.yml

PLAY [Test] *******************************************************************

GATHERING FACTS ***************************************************************
ok: [localhost]

TASK: [set_fact ansible_distribution_major_version=6] *************************
ok: [localhost]

TASK: [debug msg={{ ansible_distribution_major_version }}] ********************
ok: [localhost] => {
"msg": "6"
}

PLAY RECAP ********************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=0

So far so good.  Now we need to integrate this into our playbook in such a way that the variable is set before the third-party role is executed.  For that, we’ll use pre_tasks.  Here is an example:

---
- name: Zabbix Server
hosts: zabbix.server
sudo: yes
pre_tasks:
- set_fact: ansible_distribution_major_version=6
when: ansible_distribution == "Amazon" and ansible_distribution_major_version == "NA"
roles:
- role: dj-wasabi.zabbix-server

A minor twist here is also checking if the major version is not set yet. You can skip that, or you can change it, for example, to examine the Amazon Linux AMI version and set corresponding CentOS version.

Ansible safety net for DNS wildcard hosts

After using Ansible for only a week, I am deeply in love.  I am doing more and more with less and less, and that’s exactly how I want my automation.

Today I had to solve an interesting problem.  Ansible operates, based on the host and group inventory.  As I mentioned before, I am now always relying on FQDNs (fully qualified domain names) for my host names.  But what happens when DNS wildcards come into play with things like load balancers and reverse proxies  Consider an example:

  1. Nginx configured as reverse proxy on the machine proxy1.example.com with 10.0.0.10 IP address.
  2. DNS wildcard is in place: *.example.com 3600 IN CNAME proxy1.example.com.
  3. Ansible contains proxy1.example.com in host inventory and a playbook to setup the reverse proxy with Nginx.
  4. Ansible contains a few other hosts in inventory and a playbook to setup Nginx as a web server.
  5. Somebody adds a new host to inventory: another-web-server.example.com, without specifying any other host details, like ansible_ssh_host variable.  And he also forgets to update the DNS zone with a new A or CNAME record.

Now, Ansible play is executed for the web servers configuration.  All previously existing machines are fine.  But the new machine’s another-web-server.example.com host name resolves to proxy1.example.com, which is where Ansible connects and runs the Nginx setup, overwriting the existing configuration, triggering a service restart, and screwing up your life.  Just kidding, of course. :)  It’ll be trivial to find out what happened.  Fixing the Nginx isn’t too difficult either.  Especially if you have backups in place.  But it’s still better to avoid the whole mess altogether.

To help prevent these cases, I decided to create a new safety net role.  Given a variable like:

---
# Aliased IPs is a list of hosts, which can be reached in 
# multiple ways due to DNS wildcards. Both IPv4 and IPv6 
# can be used. The hostname value is the primary hostname 
# for the IP - any other inventory hostname having any of 
# these IPs will cause a failure in the play.
aliased_ips:
  "10.0.0.10": "proxy1.example.com"
  "192.168.0.10": "proxy1.example.com"

And the following code in the role’s tasks/main.yml:

---
- debug: msg="Safety net - before IPv4"

- name: Check all IPv4 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
  with_nested:
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv4_addresses }}"

- debug: msg="Safety net - after IPv4 and before IPv6"

- name: Check all IPv6 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
  with_nested:
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv6_addresses }}"

- debug: msg="Safety net - after IPv6"

the safety net is in place.  The first check will connect to the remote server, get the list of all configured IPv4 addresses, and then compare each one with each IP address in the aliased_ips variable.  For every matching pair, it will check if the remote server’s host name from the inventory file matches the host name from the aliased_ips value for the matched IP address.  If the host names match, it’ll continue.  If not – a failure in the play occurs (Ansible speak for thrown exception).  Other tasks will continue execution for other hosts, but nothing else will be done during this play run for this particular host.

The second check will do the same but with IPv6 addresses.  You can mix and match both IPv4 and IPv6 in the same aliased_ips variable.  And Ansible is smart enough to exclude the localhost IPs too, so things shouldn’t break too much.

I’ve tested the above and it seems to work well for me.

There is a tiny issue with elegance here though: host name to IP mappings are already configured in the DNS zone – duplicating this configuration in the aliased_ips variable seems annoying.  Personally, I don’t have that many reverse proxies and load balancers to handle, and they don’t change too often either, so I don’t mind.  Also, there is something about relying on DNS while trying to protect against DNS mis-configuration that rubs me the wrong way.  But if you are the adventurous type, have a look at the Ansible’s dig lookup, which you can use to fetch the IP addresses from the DNS server of your choice.

As always, if you see any potential issues with the above or know of a better way to solve it, please let me know.

Ansible setup for Fedora project

Real life working examples are some of the most useful things when learning a new system.  The more – the better.  That’s why this git repository of the Ansible setup for the Fedora project is a pure gold mine.  It is large.  It is complex.  It covers a whole lot of things.  But most importantly, it is alive and well tested.