Rundeck – Job Scheduler and Runbook Automation


Rundeck is yet another one of those services that I want to get my hands on but haven’t yet got the time to.  The simplest way to describe it is: cron on steroids.

Rundeck allows one to define the commands and then allow for execution on those commands manually, periodically or based on a certain trigger.  Imagine, for example, a deployment command that needs to run across some servers to which you are not comfortable giving access to developers, or even non-technical users.  You can create a command in Rundeck and give access to certain users to execute it, via clicking a button or two in a user friendly web interface.

A side benefit to using Rundeck versus cron are the metrics.  Rundeck collects metrics like successful and failed executions, execution times, etc.  So it makes it easier for you to see that certain jobs are getting progressively slower or fail on specific weekdays, etc.

The best part is that Rundeck is Open Source and self-hosted, so you don’t need to give sensitive access to some external web service.

Amazon Linux AMI 2016.09

amazon ami 2016.09

AWS Blog lets us know that Amazon Linux AMI 2016.09 is now available.  It comes with a variety of updates, such as Nginx 1.10, PHP 7, and PostgreSQL 9.5 and Python 3.5.  Another thing that got quite a bit of improvement is the boot time of the Amazon Linux AMI instances.  Here’s a comparison chart:


Read about all the changes in the release notes.

P.S.: I’m still stuck with Amazon AMI on a few of my instances, but in general I have to remind all of you to NOT use the Amazon AMI.  You’ve been warned.

Top 13 Amazon Virtual Private Cloud (VPC) Best Practices

Cloud Academy Blog goes over top 13 Amazon VPC best practices – particularly good for those just starting up with the platform.  The article discusses the following:

  1. Choosing the Proper VPC Configuration for Your Organization’s Needs
  2. Choosing a CIDR Block for Your VPC Implementation
  3. Isolating Your VPC Environments
  4. Securing Your Amazon VPC Implementation
  5. Creating Your Disaster Recovery Plan
  6. Traffic Control and Security
  7. Keep your Data Close
  8. VPC Peering
  9. EIP – Just In Case
  10. NAT Instances
  11. Determining the NAT Instance Type
  12. IAM for Your Amazon VPC Infrastructure
  13. ELB on Amazon VPC

Overall, it’s a very handy quick list.

StackOverflow: Docker vs. Vagrant, with project authors’ comments

There is this discussion over at StackOverflow: Should I use Vagrant or Docker for creating an isolated environment? It attracted the attention of the authors of both projects (as well as many other smart people).  Read the whole thing for interesting insights into what’s there now and what’s coming.  If you’d rather have a summary, here it is:

The short answer is that if you want to manage machines, you should use Vagrant. And if you want to build and run applications environments, you should use Docker.

504 Gateway Timeout error on Nginx + FastCGI (php-fpm)


“504 Gateway Timeout” error is a very common issue when using Nginx with PHP-FPM.  Usually, that means that it took PHP-FPM longer to generate the response, than Nginx was willing to wait for.  A few possible reasons for this are:

  • Nginx timeout configuration uses very small values (expecting the responses to be unrealistically fast).
  • The web server is overloaded and takes longer than it should to process requests.
  • The PHP application is slow (maybe due to database behind it being or slow).

There is plenty advice online on how to troubleshoot and sort these issues.  But when it comes down to increasing the timeouts, I found such advice to be scattered, incomplete, and often outdated.  This page, however, has a good collection of tweaks.  They are:

  1. Increase PHP maximum execution time in /etc/php.inimax_execution_time = 300
  2. Increase PHP-FPM request terminate timeout in the pool configuration (/etc/php-fpm.d/www.conf): request_terminate_timeout = 300
  3. Increase Nginx FastCGI read timeout (in /etc/nginx/nginx.conf): fastcgi_read_timeout 300;

Also, see this Stack Overflow thread for more suggestions.

P.S.: while you are sorting out your HTTP errors, have a quick look at HTTP Status Dogs, which I blogged about a while back.

Serverlessconf 2016 – New York City: a personal report

Serverlessconf 2016 – New York City: a personal report – is a fascinating read.  Let me get you hooked:

This event left me with the impression (or the confirmation) that there are two paces and speeds at which people are moving.

There is the so called “legacy” pace. This is often characterized by the notion of VMs and virtualization. This market is typically on-prem, owned by VMware and where the majority of workloads (as of today) are running. Very steady.

The second “industry block” is the “new stuff” and this is a truly moving target. #Serverless is yet another model that we are seeing emerging in the last few years. We have moved from Cloud (i.e. IaaS) to opinionated PaaS, to un-opinionated PaaS, to DIY Containers, to CaaS (Containers as a Service) to now #Serverless. There is no way this is going to be the end of it as it’s a frenetic moving target and in every iteration more and more people will be left behind.

This time around was all about the DevOps people being “industry dinosaurs”. So if you are a DevOps persona, know you are legacy already.

Sometimes I feel like I am leaving on a different planet.  All these people are so close, yet so far away …

Amazon Elastic File System

Here are some great news from the Amazon AWS blog – the announcement of the Elastic File System (EFS):

EFS lets you create POSIX-compliant file systems and attach them to one or more of your EC2 instances via NFS. The file system grows and shrinks as necessary (there’s no fixed upper limit and you can grow to petabyte scale) and you don’t pre-provision storage space or bandwidth. You pay only for the storage that you use.

EFS protects your data by storing copies of your files, directories, links, and metadata in multiple Availability Zones.

In order to provide the performance needed to support large file systems accessed by multiple clients simultaneously,Elastic File System performance scales with storage (I’ll say more about this later).

I think this might have been the most requested feature/service from Amazon AWS since EC2 launch.  Sure, one could have built an NFS file server before, but with the variety of storage options, availability zones, and the dynamic nature of the cloud setup itself, that was quite a challenge.  Now – all that and more in just a few clicks.

Thank you Amazon!


Let’s Encrypt on CentOS 7 and Amazon AMI

The last few weeks were super busy at work, so I accidentally let a few SSL certificates expire.  Renewing them is always annoying and time consuming, so I was pushing it until the last minute, and then some.

Instead of going the usual way for the renewal, I decided to try to the Let’s Encrypt deal.  (I’ve covered Let’s Encrypt before here and here.)  Basically, Let’s Encrypt is a new Certification Authority, created by Electronic Frontier Foundation (EFF), with the backing of Google, Cisco, Mozilla Foundation, and the like.  This new CA is issuing well recognized SSL certificates, for free.  Which is good.  But the best part is that they’ve setup the process to be as automated as possible.  All you need is to run a shell command to get the certificate and then another shell command in the crontab to renew the certificate automatically.  Certificates are only issued for 3 months, so you’d really want to have them automatically updated.

It took me longer than I expected to figure out how this whole thing works, but that’s because I’m not well versed in SSL, and because they have so many different options, suited for different web servers, and different sysadmin experience levels.

Eventually I made it work, and here is the complete process, so that I don’t have to figure it out again later.

We are running a mix of CentOS 7 and Amazon AMI servers, using both Nginx and Apache.   Here’s what I had to do.

First things first.  Install the Let’s Encrypt client software.  Supposedly there are several options, but I went for the official one.  Manual way:

# Install requirements
yum install git bc
cd /opt
git clone letsencrypt

Alternatively, you can use geerlingguy’s lets-encrypt-role for Ansible.

Secondly, we need to get a new certificate.  As I said before, there are multiple options here.  I decided to use the certonly way, so that I have better control over where things go, and so that I would minimize the web server downtime.

There are a few things that you need to specify for the new SSL certificate.  These are:

  • The list of domains, which the certificate should cover.  I’ll use and here.
  • The path to the web folder of the site.  I’ll use /var/www/vhosts/
  • The email address, which Let’s Encrypt will use to contact you in case there is something urgent.  I’ll use here.

Now, the command to get the SSL certificate is:

/opt/letsencrypt/certbot-auto certonly --webroot --email --agree-tos -w /var/www/vhosts/ -d -d

When you run this for the first time, you’ll see that a bunch of additional RPM packages will be installed, for the virtual environment to be created and used.  On CentOS 7 this is sufficient.  On Amazon AMI, the command will run, install things, and will fail with something like this:

WARNING: Amazon Linux support is very experimental at present...
if you would like to work on improving it, please ensure you have backups
and then run this script again with the --debug flag!

This is useful, but insufficient.  Before you can run successfully, you’ll also need to do the following:

yum install python26-virtualenv

Once that is done, run the certbot command with the –debug parameter, like so:

/opt/letsencrypt/certbot-auto certonly --webroot --email --agree-tos -w /var/www/vhosts/ -d -d --debug

This should produce a success message, with “Congratulations!” and all that.  The path to your certificate (somewhere in /etc/letsencrypt/live/ and its expiration date will be mentioned too.

If you didn’t get the success message, make sure that:

  • the domain, for which you are requesting a certificate, resolves back to the server, where you are running the certbot command.  Let’s Encrypt will try to access the site for verification purposes.
  • that public access is allowed to the /.well-known/ folder.  This is where Let’s Encrypt will store temporary verification files.  Note that the folder starts with dot, which in UNIX means hidden folder, which are often denied access to by many web server configurations.

Just drop a simple hello.txt to the /.well-known/ folder and see if you can access it with the browser.  If you can, then Let’s Encrypt shouldn’t have any issues getting you a certification.  If all else fails, RTFM.

Now that you have the certificate generated, you’ll need to add it to the web server’s virtual host configuration.  How exactly to do this varies from web server to web server, and even between the different versions of the same web server.

For Apache version >= 2.4.8 you’ll need to do the following:

SSLEngine on
SSLCertificateKeyFile /etc/letsencrypt/live/
SSLCertificateFile /etc/letsencrypt/live/

For Apache version < 2.4.8 you’ll need to do the following:

SSLEngine on
SSLCertificateKeyFile /etc/letsencrypt/live/
SSLCertificateFile /etc/letsencrypt/live/
SSLCertificateChainFile /etc/letsencrypt/live/

For Nginx >= 1.3.7 you’ll need to do the following:

ssl_certificate /etc/letsencrypt/live/;
ssl_certificate_key /etc/letsencrypt/live/;

You’ll obviously need the additional SSL configuration options for protocols, ciphers and the like, which I won’t go into here, but here are a few useful links:

Once your SSL certificate is issued and web server is configured to use it, all you need is to add an entry to the crontab to renew the certificates which are expiring in 30 days or less.  You’ll only need a single entry for all your certificates on this machine.  Edit your /etc/crontab file and add the following (adjust for your web server software, obviously):

# Renew Let's Encrypt certificates at 6pm every Sunday
0 18 * * 0 root (/opt/letsencrypt/certbot-auto renew && service httpd restart)

That’s about it.  Once all is up and running, verify and adjust your SSL configuration, using Qualys SSL Labs excellent tool.

Ansible safety net for DNS wildcard hosts

After using Ansible for only a week, I am deeply in love.  I am doing more and more with less and less, and that’s exactly how I want my automation.

Today I had to solve an interesting problem.  Ansible operates, based on the host and group inventory.  As I mentioned before, I am now always relying on FQDNs (fully qualified domain names) for my host names.  But what happens when DNS wildcards come into play with things like load balancers and reverse proxies  Consider an example:

  1. Nginx configured as reverse proxy on the machine with IP address.
  2. DNS wildcard is in place: * 3600 IN CNAME
  3. Ansible contains in host inventory and a playbook to setup the reverse proxy with Nginx.
  4. Ansible contains a few other hosts in inventory and a playbook to setup Nginx as a web server.
  5. Somebody adds a new host to inventory:, without specifying any other host details, like ansible_ssh_host variable.  And he also forgets to update the DNS zone with a new A or CNAME record.

Now, Ansible play is executed for the web servers configuration.  All previously existing machines are fine.  But the new machine’s host name resolves to, which is where Ansible connects and runs the Nginx setup, overwriting the existing configuration, triggering a service restart, and screwing up your life.  Just kidding, of course. :)  It’ll be trivial to find out what happened.  Fixing the Nginx isn’t too difficult either.  Especially if you have backups in place.  But it’s still better to avoid the whole mess altogether.

To help prevent these cases, I decided to create a new safety net role.  Given a variable like:

# Aliased IPs is a list of hosts, which can be reached in 
# multiple ways due to DNS wildcards. Both IPv4 and IPv6 
# can be used. The hostname value is the primary hostname 
# for the IP - any other inventory hostname having any of 
# these IPs will cause a failure in the play.
  "": ""
  "": ""

And the following code in the role’s tasks/main.yml:

- debug: msg="Safety net - before IPv4"

- name: Check all IPv4 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv4_addresses }}"

- debug: msg="Safety net - after IPv4 and before IPv6"

- name: Check all IPv6 addresses against aliased IPs
  fail: msg="DNS is not configured for host '{{ inventory_hostname}}'. It resolves to '{{ aliased_ips[ item.0 ] }}'."
  when: "('{{ item[0] }}' == '{{ item[1] }}') and ('{{ inventory_hostname }}' != '{{ aliased_ips[ item.0 ] }}')"
    - "{{ aliased_ips | default({}) }}"
    - "{{ ansible_all_ipv6_addresses }}"

- debug: msg="Safety net - after IPv6"

the safety net is in place.  The first check will connect to the remote server, get the list of all configured IPv4 addresses, and then compare each one with each IP address in the aliased_ips variable.  For every matching pair, it will check if the remote server’s host name from the inventory file matches the host name from the aliased_ips value for the matched IP address.  If the host names match, it’ll continue.  If not – a failure in the play occurs (Ansible speak for thrown exception).  Other tasks will continue execution for other hosts, but nothing else will be done during this play run for this particular host.

The second check will do the same but with IPv6 addresses.  You can mix and match both IPv4 and IPv6 in the same aliased_ips variable.  And Ansible is smart enough to exclude the localhost IPs too, so things shouldn’t break too much.

I’ve tested the above and it seems to work well for me.

There is a tiny issue with elegance here though: host name to IP mappings are already configured in the DNS zone – duplicating this configuration in the aliased_ips variable seems annoying.  Personally, I don’t have that many reverse proxies and load balancers to handle, and they don’t change too often either, so I don’t mind.  Also, there is something about relying on DNS while trying to protect against DNS mis-configuration that rubs me the wrong way.  But if you are the adventurous type, have a look at the Ansible’s dig lookup, which you can use to fetch the IP addresses from the DNS server of your choice.

As always, if you see any potential issues with the above or know of a better way to solve it, please let me know.