CentOS 7.3 released

CentOS 7.3 was released rather quietly a couple of days ago.  Or maybe it wasn’t quietly, but I still somehow missed it.  Here is a list of major changes:

  • Since release 1503 (abrt>= 2.1.11-19.el7.centos.0.1) CentOS-7 can report bugs directly to bugs.centos.org.
  • Various new packages include among others: python-gssapi, python-netifaces, mod_auth_openidc, pidgin and Qt5.
  • Support for the 7th-generation Core i3, i5, and i7 Intel processors and I2C on 6th-generation Core Processors has been added.
  • Various packages have been rebased. Some of those are samba, squid, systemd, krb5, gcc-libraries, binutils, gfs-utils, libreoffice, GIMP,SELinux, firewalld, libreswan, tomcat and open-vm-tools.
  • SHA2 is now supported by OpenLDAP.
  • ECC-support has been added to OPenJDK-8, PerlNet:SSLeay and PerlIO::Socket::SSL.
  • Bluetooth LE is now supported.
  • virt-p2v is now fully supported. virt-v2v and virt-p2v add support for the latest windows releases.
  • Lots of updated storage, network and graphics drivers.
  • Technology Preview: Among others support of Btrfs, OverlayFS, CephFS, DNSSEC, kpatch, the Cisco VIC and usNIC kernel driver, nested virtualization with KVM and multi-threaded xz compression with rpm-builds.

More information is here.

Also, make sure you read the Known Issues section, as it might surprise you:

  • SElinux received major changes in this release, which might break certain functionality on your system. You might want to take a look at this bugzilla entry for further information.
  • The initramfs files are now significantly bigger than in CentOS-7 (1503). You may want to consider lowering installonly_limit in /etc/yum.conf to reduce the number of installed kernels if your /boot partition is smaller than 400MB. New installations should consider using 1GB as the size of the /boot partition.
  • The newer version of openssh in this release does not exit on the first match in the .ssh/config file as the older version did. This means if you have multiple host sections that match in your config for a given host, ALL will be applied. As an example, if you have a “host1.example.com” entry and a “*.example.com” entry, it will apply BOTH sets of instructions to “host1.example.com” but only the “*.example.com” section for “host2.example.com”.
  • Many people have complained that Ethernet interfaces are not started with the new default NetworkManager tool/have to be explicitly enabled during installation. See CentOS-7 FAQ#2.
  • At least 1024 MB RAM is required to install and use CentOS-7 (1611). When using the Live ISOs for install, 1024 MB RAM produces very slow results and even some install failures. At least 1344 MB RAM is recommend for LiveGNOME or LiveKDE installs.
  • If your screen resolution is 800×600 or lower, parts of the images shown at the bottom during install are clipped.
  • VMware Workstation/VMware ESXi allow to install two different virtual SCSI adapters: BusLogic and LsiLogic. However the default kernel from CentOS-7 does not include the corresponding driver for any of them thus resulting in an unbootable system if you install on a SCSI disk using the defaults for CentOS Linux. If you select ‘Red Hat Enterprise Linux’ as OS, the paravirtualized SCSI adapter is used, which works.
  • Commonly used utilities such as ifconfig/netstat have been marked as deprecated for some considerable time and the ‘net-tools’ package is no longer part of the @core group so will not be installed by default. Use nmcli c up ifname <interfacename> to get your network up and running and use yum to install the package if you really need it. Kickstart users can pull in the net-tools package as part of the install.
  • The AlpsPS/2 ‘ALPS DualPoint TouchPad’ edge scrolling does not work by default on CentOS-7. See bug 7403 for the command to make this feature work.
    After the update, some NICs may change their name from something like enoxxxxxxxx to something like ensxxx. This is due to the updated systemd package.
  • The 4 STIG Security Profiles in the anaconda installer produce a broken sshd_config that must be edited before sshd will start (BZ 1401069)

EPEL : the effort behind the scenes

Catching up with recent news, I came across this blog post by Stephen John Smoogen in Fedora People, where he explains the reason for the recent disappearance of the Puppet package from the Extra Packages for Enterprise Linux (EPEL 6) repository:

This week various people using EPEL on RHEL and CentOS 6 have found that the puppet package is no longer provided by EPEL. The reason for this is due to the way EPEL packages are built and kept inside the repository. A package needs a sponsor so that we can hopefully get bug fixes and security updates to it. In the case of puppet this package is sponsored by the user kanarip. However, most packages aren’t whole pieces, they rely on other software.. in this case the package puppet relies on a lot of different ruby gems of which one of them was called ruby-shadow. This package was orphaned 30 weeks ago and while it did have other people watching it, none of them took over the package.


Last week a large cleanup was done to clean out orphaned packages from EPEL which removed ruby-shadow. Once that was removed, then all of the other packages depending on ruby-shadow were also removed. Today various people reinstalling systems found puppet wasn’t around and came onto #epel to ask.. which seems to have gotten the packages responsored and hopefully they will be back in the EPEL release in a day or so.

This problem has been happening a lot lately. I think it shows quite a few problems with how EPEL is set up and managed. For this, I take responsibility as I said I would try to clean it up after FOSDEM 2016 and it is still happening.

Unpleasant annoyance that shouldn’t have happened, right?  Well, yes, maybe.

Software is a complex matter, whether you are designing, developing, testing, or distributing it.  So things do go wrong sometimes.  And that was something I wanted to focus on for a second.

Forget the actual designing, developing, testing and documenting the software.  Forget all the infrastructure behind such a vital part of the Linux ecosystem as EPEL.  Just think of this single issue for a moment.  Once again:

A package needs a sponsor so that we can hopefully get bug fixes and security updates to it.

So what, I hear you say.  Well, let’s take a closer look.  EPEL provides packages for multiple versions of the distribution, hardware platforms and so on.  Let’s just look at the EPEL 6 for x86_64 (to keep things simple).  That looks like a lot of packages, doesn’t it?.  How many? At the time of this writing, from a random mirror that I got:

wget -q -O - http://download.fedoraproject.org/pub/epel/6/x86_64/ | grep -c 'unknown.gif'

Yup. That’s 12,129 packages!  And each one of those has at least one developer behind it, to sponsor.  Some of those amazing people obviously maintain more than one package. Some packages are maintained by multiple people.  All of them are working hard behind the scenes for you and me to have an easy and stable access to a whole lot of software.  Here is a quote from the FAQ which is smoked and marinated in all that effort:

Software packages in EPEL are maintained on a voluntary basis. If you to want ensure that the packages you want remain available, get involved directly in the EPEL effort. More experienced maintainers help review your packages and you learn about packaging. If you can, get your packaging role included as part of your job description; EPEL has written a generic description that you can use as the basis for adding to a job description.

We do our best to make this a healthy project with many contributors who take care of the packages in the repository, and the repository as a whole, for all releases until RHEL closes support for the distribution version the packages were built for. That is ten years after release (currently) — a long time frame, and we know a lot can happen in ten years. Your participation is vital for the success of this project.

I don’t know about you, but for me, this is absolutely mind-blowing.  So I just wanted to take this opportunity to say thank you to all the brilliant people behind the scenes, who are often invisible, yet indispensable for the continuous success of Open Source software in general, and Linux in particular.

You guys rock!

Automate OpenVPN client on CentOS 7

I need to setup OpenVPN client to start automatically on a CentOS 7 server for one of our recent projects at work.  I’m not well versed in VPN technology, but the majority of the time was spent on something that I didn’t expect.

I go the VPN configuration and all the necessary certificates from the client, installed OpenVPN and tried it out.  It seemed to work just fine.  But the setting it up to start automatically and without any human intervention took much longer than I though it would.

The first issue that I came across was the necessary input of username and password for the VPN connection to be established.  The solution to that is simple (thanks to this comment):

  1. Create a new text file (for example, /etc/openvpn/auth) with the username being the first line of the file, and the password being the second.  Don’t forget to limit the permissions to read-only by root.
  2. Add the following line to the VPN configuration file (assuming /etc/openvpn/client.conf): “auth-user-pass auth“.  Here, the second “auth” is the name of the file, relative to the VPN configuration.

With that, the manual startup of the VPN (openvpn client.conf) was working.

Now, how do we start the service automatically?  The old-school knowledge was suggesting “service openvpn start”.  But that fails due to openvpn being an uknown service.  Weird, right?

“rpm -ql openvpn” pointed to the direction of the systemd service (“systemctl start openvpn”).  But that failed too.  The name of the service was strangely looking too:

# rpm -ql openvpn | grep service

A little (well, not that little after all) digging around, revealed something that I didn’t know.  Systemd services can be started with different configuration files.  In this case, you can run “systemctl start openvpn@foobar” to start the OpenVPN service using “foobar” configuration file, which should be in “/etc/openvpn/foobar.conf“.

What’s that config file and where do I get it from?  Well, the OpenVPN configuration sent from our client had a “account@host.ovpn” file, which is exactly what’s needed.  So, renaming “account@host.ovpn” to “client.conf” and moving it together with all the other certificate files into “/etc/openvpn” folder allowed me to do “systemctl start openvpn@client“.  All you need now is to make the service start automatically at boot time and you are done.

Setting up NAT on Amazon AWS

When it comes to Amazon AWS, there are a few options for configuring Network Address Translation (NAT).  Here is a brief overview.

NAT Gateway

NAT Gateway is a configuration very similar to Internet Gateway.  My understanding is that the only major difference between the NAT Gateway and the Internet Gateway is that you have the control over the external public IP address of the NAT Gateway.  That’ll be one of your allocated Elastic IPs (EIPs).  This option is the simplest out of the three that I considered.  If you need plain and simple NAT – than that’s a good one to go for.

NAT Instance

NAT Instance is a special purpose EC2 instance, which is configured to do NAT out of the box.  If you need anything on top of plain NAT (like load balancing, or detailed traffic monitoring, or firewalls), but don’t have enough confidence in your network and system administration skills, this is a good option to choose.

Custom Setup

If you are the Do It Yourself guy, this option is for you.   But it can get tricky.  Here are a few things that I went through, learnt and suffered through, so that you don’t have to (or future me, for that matter).

Let’s start from the beginning.  You’ve created your own Virtual Private Cloud (VPC).  In that cloud, you’ve created two subnets – Public and Private (I’ll use this for example, and will come back to what happens with more).  Both of these subnets use the same routing table with the Internet Gateway.  Now you’ve launched an EC2 instance into your Public subnet and assigned it a private IP address.  This will be your NAT instance.  You’ve also launched another instance into the Private subnet, which will be your test client.  So far so good.

This instance will be used for translating internal IP addresses from the Private subnet to the external public IP address.  So, we, obviously, need an external IP address.  Let’s allocate an Elastic IP and associate it with the EC2 instance.  Easy peasy.

Now, we’ll need to create another routing table, using our NAT instance as the default gateway.  Once created, this routing table should be associated with our Private subnet.  This will cause all the machines on that network to use the NAT instance for any external communications.

Let’s do a quick side track here – security.  There are three levels that you should keep in mind here:

  • Network ACLs.  These are Amazon AWS access control lists, which control the traffic allowed in and out of the networks (such as our Public and Private subnets).  If the Network ACL prevents certain traffic, you won’t be able to reach the host, irrelevant of the host security configuration.  So, for the sake of the example, let’s allow all traffic in and out of both the Public and Private networks.  You can adjust it once your NAT is working.
  • Security Groups.  These are Amazon AWS permissions which control what type of traffic is allowed in or out of the network interface.  This is slightly confusing for hosts with the single interface, but super useful for machines with multiple network interfaces, especially if those interfaces are transferred between instances.  Create a single Security Group (for now, you can adjust this later), which will allow any traffic in from your VPC range of IPs, and any outgoing traffic.  Assign this Security Group to both EC2 instances.
  • Host firewall.  Chances are, you are using a modern Linux distribution for your NAT host.  This means that there is probably an iptables service running with some default configuration, which might prevent certain access.  I’m not going to suggest to disable it, especially on the machine facing the public Internet.  But just keep it in mind, and at the very least allow the ICMP protocol, if not from everywhere, then at least from your VPC IP range.

Now, on to the actual NAT.  It is technically possible to setup and use NAT on the machine with the single network interface, but you’d probably be frowned upon by other system and network administrators.  Furthermore, it doesn’t seem to be possible on the Amazon AWS infrastructure.  I’m not 100% sure about that, but I’ve spent more time than I had to figure this out and I failed miserably.

The rest of the steps would greatly benefit from a bunch of screenshots and step-by-step click through guides, which I am too lazy to do.  You can use this manual, as a base, even though it covers a slightly different, more advanced setup.  Also, you might want to have a look at CentOS 7 instructions for NAT configuration, and the discussion on the differences between SNAT and MASQUERADE.

We’ll need a second network interface.  You can create a new Network Interface with the IP in your Private subnet and attach it to the NAT instance.  Here comes a word of caution:  there is a limit on how many network interfaces can be attached to EC2 instance.  This limit is based on the type of the instance.   So, if you want to use a t2.nano or t2.micro instance, for example, you’d be limited to only two interfaces.  That’s why I’ve used the example with two networks – to have a third interface added, you’d need a much bigger instance, like t2.medium. (Which is a total overkill for my purposes.)

Now that you’ve attached the second interface to your EC2 instance, we have a few things to do.  First, you need to disable “Source/Destination Check” on the second network interface.  You can do it in your AWS Console, or maybe even through the API (I haven’t gone that deep yet).

It is time to adjust the configuration of our EC2 instance.  I’ll assume CentOS 7 Linux distribution, but it’d be very easy to adjust to whatever other Linux you are running.

Firstly, we need to configure the second network interface.  The easiest way to do this is to copy /etc/sysconfig/network-scripts/ifcfg-eth0 file into /etc/sysconfig/network-scripts/ifcfg-eth1, and then edit the eth1 one file changing the DEVICE variable to “eth1“.  Before you restart your network service, also edit /etc/sysconfig/network file and add the following: GATEWAYDEV=eth0 .  This will tell the operating system to use the first network interface (eth0) as the gateway device.  Otherwise, it’ll be sending things into the Private network and things won’t work as you expect them.  Now, restart the network service and make sure that both network interfaces are there, with correct IPs and that your routes are fine.

Secondly, we need to tweak the kernel for the NAT job (sounds funny, doesn’t it?).  Edit your /etc/sysctl.conf file and make sure it has the following lines in it:

# Enable IP forwarding
# Disable ICMP redirects

Apply the changes with sysctl -p.

Thirdly, and lastly, configure iptables to perform the network address translation.  Edit /etc/sysconfig/iptables and make sure you have the following:

:PREROUTING ACCEPT [48509:2829006]
:INPUT ACCEPT [33058:1879130]
:OUTPUT ACCEPT [57243:3567265]
:POSTROUTING ACCEPT [55162:3389500]

Adjust the IP range from to your VPC range or the network that you want to NAT.  Restart the iptables service and check that everything is hunky-dory:

  1. The NAT instance can ping a host on the Internet (like
  2. The NAT instance can ping a host on the Private network.
  3. The host on the Private network can ping the NAT instance.
  4. The host on the Private network can ping a host on the Internet (like

If all that works fine, don’t forget to adjust your Network ACLs, Security Groups, and iptables to whatever level of paranoia appropriate for your environment.  If something is still not working, check all of the above again, especially for security layers, IP addresses (I spent a coupe of hours trying to find the problem, when it was the IP address typo – 10.0.0/16 – not the most obvious of things), network masks, etc.

Hope this helps.

Forcing Amazon Linux AMI compatibility with CentOS in Ansible

One of the things that makes Ansible so awesome is a huge collection of shared roles over at Ansible Galaxy.  These bring you best practices, flexible configurations and in general save hours and hours of hardcore swearing and hair pulling.

Each role usually supports multiple versions of multiple Linux distributions.  However, you’ll find that the majority of the supported distributions are Ubuntu, Debian, Red Hat Enterprise Linux, CentOS, and Fedora.  The rest aren’t as popular.

Which brings me to the point with Amazon Linux AMI.  Amazon Linux AMI is mostly compatible with CentOS, but it uses a different version approach, which means that most of those Ansible roles will ignore or complain about not supporting Amazon AMI.

Here is an example I came across yesterday from the dj-wasabi.zabbix-server role.  The template for the Yum repository uses ansible_os_major_version variable, which is expected to be similar to Red Hat / CentOS version number – 5, 6, 7, etc.  Amazon Linux AMI’s major version is reported as “NA” – not available.   That’s probably because Amazon Linux AMI versions are date-based – with the latest one being 2016.03.

name=Zabbix Official Repository - $basearch
baseurl=http://repo.zabbix.com/zabbix/{{ zabbix_version }}/rhel/{{ ansible_distribution_major_version }}/$basearch/

Officially, Amazon Linux AMI is not CentOS or Red Hat Enterprise Linux.  But if you don’t care about such little nuances, and you are brave enough to experiment and assume things, than you can make that role work, by simply setting the appropriate variables to the values that you want.

First, here is a standalone test.yml playbook to try things out:

- name: Test
  hosts: localhost
  - set_fact: ansible_distribution_major_version=6
    when: ansible_distribution == "Amazon"
  - debug: msg={{ ansible_distribution_major_version }}

Let’s run it and look at the output:

$ ansible-playbook test.yml

PLAY [Test] *******************************************************************

GATHERING FACTS ***************************************************************
ok: [localhost]

TASK: [set_fact ansible_distribution_major_version=6] *************************
ok: [localhost]

TASK: [debug msg={{ ansible_distribution_major_version }}] ********************
ok: [localhost] => {
  "msg": "6"

PLAY RECAP ********************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=0

So far so good.  Now we need to integrate this into our playbook in such a way that the variable is set before the third-party role is executed.  For that, we’ll use pre_tasks.  Here is an example:

- name: Zabbix Server
  hosts: zabbix.server
  sudo: yes
    - set_fact: ansible_distribution_major_version=6
      when: ansible_distribution == "Amazon" and ansible_distribution_major_version == "NA"
    - role: dj-wasabi.zabbix-server

A minor twist here is also checking if the major version is not set yet. You can skip that, or you can change it, for example, to examine the Amazon Linux AMI version and set corresponding CentOS version.

Let’s Encrypt on CentOS 7 and Amazon AMI

The last few weeks were super busy at work, so I accidentally let a few SSL certificates expire.  Renewing them is always annoying and time consuming, so I was pushing it until the last minute, and then some.

Instead of going the usual way for the renewal, I decided to try to the Let’s Encrypt deal.  (I’ve covered Let’s Encrypt before here and here.)  Basically, Let’s Encrypt is a new Certification Authority, created by Electronic Frontier Foundation (EFF), with the backing of Google, Cisco, Mozilla Foundation, and the like.  This new CA is issuing well recognized SSL certificates, for free.  Which is good.  But the best part is that they’ve setup the process to be as automated as possible.  All you need is to run a shell command to get the certificate and then another shell command in the crontab to renew the certificate automatically.  Certificates are only issued for 3 months, so you’d really want to have them automatically updated.

It took me longer than I expected to figure out how this whole thing works, but that’s because I’m not well versed in SSL, and because they have so many different options, suited for different web servers, and different sysadmin experience levels.

Eventually I made it work, and here is the complete process, so that I don’t have to figure it out again later.

We are running a mix of CentOS 7 and Amazon AMI servers, using both Nginx and Apache.   Here’s what I had to do.

First things first.  Install the Let’s Encrypt client software.  Supposedly there are several options, but I went for the official one.  Manual way:

# Install requirements
yum install git bc
cd /opt
git clone https://github.com/certbot/certbot letsencrypt

Alternatively, you can use geerlingguy’s lets-encrypt-role for Ansible.

Secondly, we need to get a new certificate.  As I said before, there are multiple options here.  I decided to use the certonly way, so that I have better control over where things go, and so that I would minimize the web server downtime.

There are a few things that you need to specify for the new SSL certificate.  These are:

  • The list of domains, which the certificate should cover.  I’ll use example.com and www.example.com here.
  • The path to the web folder of the site.  I’ll use /var/www/vhosts/example.com/
  • The email address, which Let’s Encrypt will use to contact you in case there is something urgent.  I’ll use ssl@example.com here.

Now, the command to get the SSL certificate is:

/opt/letsencrypt/certbot-auto certonly --webroot --email ssl@example.com --agree-tos -w /var/www/vhosts/example.com/ -d example.com -d www.example.com

When you run this for the first time, you’ll see that a bunch of additional RPM packages will be installed, for the virtual environment to be created and used.  On CentOS 7 this is sufficient.  On Amazon AMI, the command will run, install things, and will fail with something like this:

WARNING: Amazon Linux support is very experimental at present...
if you would like to work on improving it, please ensure you have backups
and then run this script again with the --debug flag!

This is useful, but insufficient.  Before you can run successfully, you’ll also need to do the following:

yum install python26-virtualenv

Once that is done, run the certbot command with the –debug parameter, like so:

/opt/letsencrypt/certbot-auto certonly --webroot --email ssl@example.com --agree-tos -w /var/www/vhosts/example.com/ -d example.com -d www.example.com --debug

This should produce a success message, with “Congratulations!” and all that.  The path to your certificate (somewhere in /etc/letsencrypt/live/example.com/) and its expiration date will be mentioned too.

If you didn’t get the success message, make sure that:

  • the domain, for which you are requesting a certificate, resolves back to the server, where you are running the certbot command.  Let’s Encrypt will try to access the site for verification purposes.
  • that public access is allowed to the /.well-known/ folder.  This is where Let’s Encrypt will store temporary verification files.  Note that the folder starts with dot, which in UNIX means hidden folder, which are often denied access to by many web server configurations.

Just drop a simple hello.txt to the /.well-known/ folder and see if you can access it with the browser.  If you can, then Let’s Encrypt shouldn’t have any issues getting you a certification.  If all else fails, RTFM.

Now that you have the certificate generated, you’ll need to add it to the web server’s virtual host configuration.  How exactly to do this varies from web server to web server, and even between the different versions of the same web server.

For Apache version >= 2.4.8 you’ll need to do the following:

SSLEngine on
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem

For Apache version < 2.4.8 you’ll need to do the following:

SSLEngine on
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLCertificateFile /etc/letsencrypt/live/example.com/cert.pem
SSLCertificateChainFile /etc/letsencrypt/live/example.com/chain.pem

For Nginx >= 1.3.7 you’ll need to do the following:

ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

You’ll obviously need the additional SSL configuration options for protocols, ciphers and the like, which I won’t go into here, but here are a few useful links:

Once your SSL certificate is issued and web server is configured to use it, all you need is to add an entry to the crontab to renew the certificates which are expiring in 30 days or less.  You’ll only need a single entry for all your certificates on this machine.  Edit your /etc/crontab file and add the following (adjust for your web server software, obviously):

# Renew Let's Encrypt certificates at 6pm every Sunday
0 18 * * 0 root (/opt/letsencrypt/certbot-auto renew && service httpd restart)

That’s about it.  Once all is up and running, verify and adjust your SSL configuration, using Qualys SSL Labs excellent tool.

Absolute stupidity of include directive in /etc/sudoers, and Microsoft Azure

I’ve just spent three hours (!!!) trying to troubleshoot why sudo was misbehaving on a brand new CentOS 7 server.  I was doing the setup of two identical servers in parallel (for two different clients).   One server worked as expected, the other one didn’t.

The thing I was trying to do was trivial – allow users in the wheel group execution of sudo commands without password. I’ve done it a gadzillion times in the past, and probably at least a dozen times just this week alone.  Here’s what’s needed:

  1. Add user to the wheel group.
  2. Edit /etc/sudoers file to uncommen tthe line (as in: remove the hash comment character from the beginning of the file): # %wheel ALL=(ALL) NOPASSWD: ALL
  3. Enjoy!

Imagine my surprise when it only worked on one server and not on the other.  I’ve dug deep and wide.  Took a break. And dug again.  Then, I’ve summoned the great troubleshooting powers of my brother.  But even those didn’t help.

Lots of logging, diff-ing, strace-ing, swearing and hair pulling later, the problem was found and fixed.  The issue was due to two separate reasons.

Reason 1: /etc/sudoers syntax uses the hash character (#) for two different purposes.

  1. For comments, which there are plenty of in the file.
  2. For the “#include” and “#includedir” directives, which include other files into the configuration.

The default /etc/sudoers file is full of lengthy comments.  Just to give you and idea:

(root@host ~)# wc -l /etc/sudoers
118 /etc/sudoers
(root@host ~)# grep -v '^#' /etc/sudoers | grep -v '^$' | wc -l

Yup.  118 lines in total vs. 12 lines of configuration (comments and empty lines removed). Like with banner blindness, this causes comment blindness.  Especially towards the end of the file.  Especially if you’ve seen this file a billion times before.

And that’s where the problem starts.  Right at the bottom of the file, there are these two lines:

##Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)
#includedir /etc/sudoers.d

Interesting, right? Usually there is nothing in the /etc/sudoers.d/ folder on the brand new CentOS box. But even if there was something, by now you’d assume that the include of the folder is commented out. Much like that wheel group configuration I mentioned earlier. I found it by accident, while reading sudoers(5) manual page, trying to find out if there are any other locations or defaults for included configurations. About 600 lines into the manual, there is this:

To include /etc/sudoers.local from within /etc/sudoers we 
would use the following line in /etc/sudoers:

    #include /etc/sudoers.local

When sudo reaches this line it will suspend processing of 
the current file (/etc/sudoers) and switch to 

So that comment is not a comment at all, but an include of the folder.  That’s the first part of the problem.

Reason #2: Windows Azure Linux Agent

As I mentioned above, the servers aren’t part of my infrastructure – they were provided by the clients.  I was basically given an IP address, a username and a password for each server – which is usually all I need.  In most cases I don’t really care where the server is hosted and what’s the hosting company in use.  Turns out, I should.

The server with the problem was hosted on the Microsoft Azure cloud infrastructure.  I assumed I was working off a brand new vanilla CentOS 7 box, but in fact I wasn’t.  Microsoft adds packages to the default install.  On of the packages that it adds is the Windows Azure Linux Agent, which “rpm -qi WALinuxAgent” describes as following:

The Windows Azure Linux Agent supports the provisioning and running of Linux VMs in the Microsoft Azure cloud. This package should be installed on Linux disk images that are built to run in the Microsoft Azure environment.

Harmless, right? Well, not so much.  What I found in the /etc/sudoers.d/ folder was a little file, called waagent, which included the different sudo configuration for the user which I had a problem with.

During the troubleshooting process, I’ve created a new test user, added the account to the wheel group and found out that it was working fine.  From there, I needed to find the differences between the two users.

I guess, the user that I was using initially was created by the client’s system administrator using Microsoft Azure web interface.  A quick Google search brings this page from the Azure documentation:

By default, the root user is disabled on Linux virtual machines in Azure. Users can run commands with elevated privileges by using the sudo command. However, the experience may vary depending on how the system was provisioned.

  1. SSH key and password OR password only – the virtual machine was provisioned with either a certificate (.CER file) or SSH key as well as a password, or just a user name and password. In this case sudo will prompt for the user’s password before executing the command.
  2. SSH key only – the virtual machine was provisioned with a certificate (.cer, .pem, or .pubfile) or SSH key, but no password. In this case sudo will not prompt for the user’s password before executing the command.

I checked the user’s home folder and found no keys in there, so I think it was provisioned using the first option, with password only.

I think Microsoft should make it much more obvious that the system behavior might be different.  Amazon AWS provides a good example to follow.  When you login into Amazon AMI instance, you see a message of the day (motd) banner, which looks like this:

$ ssh server.example.com
Last login: Tue Apr  5 17:25:38 2016 from

       __|  __|_  )
       _|  (     /   Amazon Linux AMI



It’s dead obvious that you are now on the Amazon EC2 machine and you should adjust your expectations assumptions accordingly.

Deleting the file immediately solved the problem.  To avoid similar issues in the future, #includedir directive can be moved further up in the file, and surrounded by more visible comments.  Like, maybe, an ASCII art skull, or something.

ASCII skull

With that, I am off to heavy drinking and recovery… Stay sane!


Do Not Use Amazon Linux

I came across “Do Not Use Amazon Linux” opinion on Ex Ratione.  I have to say that I mostly agree with it.  When I initially started using Amazon Web Services, I assumed (due to time constraints mostly) that Amazon Linux was a close derivative of CentOs and I opted for that.  For the majority of things that affect applications in my environment that holds true, however it’s not all as simple as it sounds.

There are in fact differences that have to be taken into account.  Some of the configuration issues can be abstracted with the tools like Puppet (which I do use).  But not all of it.   I’ve been bitten by package names and version differences (hello PHP 5.3, 5.4, and 5.5; and MySQL and MariaDB) between Amazon AMI and CentOS distribution.  It’s an absolute worst when trying to push an application from our testing and development environments into the client’s production environment.  Especially when tight deadlines are involved.

One of the best reasons for CentOS is that developers can easily have their local environments (Vagrant anyone?) setup in an exactly the same way as test and production servers.

fpm – build packages for multiple platforms (deb, rpm, etc) with great ease and sanity

fpm – Effing package management! Build packages for multiple platforms (deb, rpm, etc) with great ease and sanity.