Why I left my new MacBook for a $250 Chromebook

Why I left my new MacBook for a $250 Chromebook” is a nice write up of a new Chromebook user.  Even though I don’t own a MacBook (or any Mac products for that matter), I have been considering a Chromebook for a while now too.

My biggest concern is obviously programming and system administration tools – editors, terminals, remote access, etc.  But it’s getting there.

Apart from the experiences and wishlists, I found these two links useful:

How to Recover an Unreachable EC2 Linux Instance

volume

Here is a tutorial that will come handy one day, in the moment of panic – How to Recover an Unreachable Linux Instance. It has plenty of screenshots and shows each step in detail.

TL;DR version:

  1. Start a new instance (or pick one from the existing ones).
  2. Stop the broken instance.
  3. Detach the volume from the broken instance.
  4. Attach the volume to the new/existing instance as additional disk.
  5. Troubleshoot and fix the problem.
  6. Detach the volume from the new/existing instance.
  7. Attach the volume to the broken instance.
  8. Start the new instance.
  9. Get rid of the useless new instance, if you didn’t reuse the existing one for the troubleshooting and fixing process.
  10. ???
  11. PROFIT!

SSH multiplexing and Ansible via bastion host

It never ceases to amaze me how even after years and years of working with some technologies I keep finding out about super useful features in those technologies, that could have saved me lots of time if I knew about them earlier.  Today was a day just like that.

I was working on the Ansible setup for a new hosting environment.  One particular thing I wanted to utilize more was a bastion host – a single Linux machine with exposed secure shell (SSH) port, which will be used for managing the configurations of all the servers within the environment.  I sort of done that before, but the solution wasn’t as elegant as I wanted it to be.

So, I came across this article – Running Ansible Through an SSH Bastion Host.  Which, among other things taught me about a feature that I didn’t know nothing about.  Literally.  Haven’t even heard about it.  Multiplexing in OpenSSH:

Multiplexing is the ability to send more than one signal over a single line or connection. With multiplexing, OpenSSH can re-use an existing TCP connection for multiple concurrent SSH sessions rather than creating a new one each time.

This doesn’t sound too useful for when you are working in command line, one server at a time.  Who cares how many TCP connections do you need? It’ll be one, or two, or five.  Ten, if you are really involved.  But by that time you’ll probably be running background processes, and screen or tmux (which are apparently called “terminal multiplexers“).

It’s when you are going deeper into automation, such as in my case with Ansible, when you’ll need OpenSSH multiplexing.  Ansible, being a configuration manager, can run a whole lot of commands one after another.  It can run them on multiple servers in parallel as well.  That’s where reusing the connections can make quite a bit of a difference.  If every command you run connects to the remote server, executes, and then disconnects, you can benefit from not needing to connect and disconnect multiple times (tens or hundreds of times, every playbook run).   Reusing connection for parallel jobs is even better – and that’s a case with bastion host, for example.

Here are a few useful links from that article, just in case the ether eats it one day:

Armed with those, I had my setup running in no time.  The only minor correction I had to do for my case was the SSH configuration for the bastion host.  The example in the article is NOT wrong:

Host 10.10.10.*
  ProxyCommand ssh -W %h:%p bastion.example.com
  IdentityFile ~/.ssh/private_key.pem

Host bastion.example.com
  Hostname bastion.example.com
  User ubuntu
  IdentityFile ~/.ssh/private_key.pem
  ForwardAgent yes
  ControlMaster auto
  ControlPath ~/.ssh/ansible-%r@%h:%p
  ControlPersist 5m

It’s just that in my case, I use hostnames both for the bastion host and the hosts which are managed through it.  So I had to adjust it as so:

Host *.example.com !bastion.example.com
  ProxyCommand ssh -W %h:%p bastion.example.com
  IdentityFile ~/.ssh/private_key.pem

Host bastion.example.com
  Hostname bastion.example.com
  User ubuntu
  IdentityFile ~/.ssh/private_key.pem
  ForwardAgent yes
  ControlMaster auto
  ControlPath ~/.ssh/ansible-%r@%h:%p
  ControlPersist 5m

Notice the two changes:

  1. Switch of the first block from IP addresses to host names, with a mask.
  2. Negation of the bastion host configuration.

The reason for the second change is that if there are multiple Host matches in the configuration file, OpenSSH will combine all options from the matched configurations (something I didn’t find in the ssh_config manual).  Try this example ssh.conf with some real hosts of yours:

Host bastion.example.com
	User someuser

Host *.example.com
	Port 2222

You’ll see the output similar to this:

$ ssh -F ssh.conf bastion.example.com -v
OpenSSH_7.2p2, OpenSSL 1.0.2h-fips  3 May 2016
debug1: Reading configuration data ssh.conf
debug1: ssh.conf line 1: Applying options for bastion.example.com
debug1: ssh.conf line 4: Applying options for *.example.com
debug1: Connecting to bastion.example.com [1.2.3.4] port 2222.
^C

Once you negate the bastion host from the wildcard configuration, everything works as expected.

You might also try using “%r@%h:%p” for the socket to be different for each remote username that you will concurrently connect with, but that’s just nit-picking.

Packer – a tool for creating VM and container images

With the recent explosion in the virtualization and container technologies, one is often left disoriented.  Questions like “should I use virtual machines or containers?”, “which technology should I use”, and “can I migrate from one to another later?” are just some of those that will need answering.

Here is an open source tool that helps to avoid a few of those questions – Packer (by HashiCorp):

Packer is a tool for creating machine and container images for multiple platforms from a single source configuration.

Have a look at the supported platforms:

  • Amazon EC2 (AMI). Both EBS-backed and instance-store AMIs within EC2, optionally distributed to multiple regions.
  • DigitalOcean. Snapshots for DigitalOcean that can be used to start a pre-configured DigitalOcean instance of any size.
  • Docker. Snapshots for Docker that can be used to start a pre-configured Docker instance.
  • Google Compute Engine. Snapshots for Google Compute Engine that can be used to start a pre-configured Google Compute Engine instance.
  • OpenStack. Images for OpenStack that can be used to start pre-configured OpenStack servers.
  • Parallels (PVM). Exported virtual machines for Parallels, including virtual machine metadata such as RAM, CPUs, etc. These virtual machines are portable and can be started on any platform Parallels runs on.
  • QEMU. Images for KVM or Xen that can be used to start pre-configured KVM or Xen instances.
  • VirtualBox (OVF). Exported virtual machines for VirtualBox, including virtual machine metadata such as RAM, CPUs, etc. These virtual machines are portable and can be started on any platform VirtualBox runs on.
  • VMware (VMX). Exported virtual machines for VMware that can be run within any desktop products such as Fusion, Player, or Workstation, as well as server products such as vSphere.

The only question remaining now, it seems, is “why wouldn’t you use it?”. :)

10 Favorite Job Interview Questions for Linux System Administrators

As someone who interviews a lot of people (mostly for the web development positions though, not system administration), I’m always looking for more ideas on what to ask the candidates.  Today I came across “10 Favorite Job Interview Questions for Linux System Administrators“, which has a few of bits that I liked.

First of all, this GitHub repository is super awesomeness.  It also links to a few other resources with more questions and ideas.  Not only for sysadmin interviews.

Then, this one is funny, yet somewhat challenging:

2. Name and describe a different Linux/Unix command for each letter of the alphabet. But also, describe how a common flush toilet works.

It also checks that you know the alphabet.

9. Print the content of a file backwards.

“I like broad questions where each person could give a different answer depending on their depth of knowledge. My personal answer is 8 characters not including the filename.” – Marc Merlin, Google.

This one caught me by surprise.  My immediate thought was “tac some_file“, but that’s obviously not enough.  tac only prints the lines in reverse order.  Which is not the same as reversing the file.  Perl to the rescue, but I wonder what’s the most elegant way to do it without the scripting language.

As always, interview questions are not only useful for the interviews.  They are a good measure of your own knowledge gaps and habit pitfalls.  This time was no exception.

Test your backups!

You can read all the books in the world and know all there is to know, but if you don’t follow the wisdom and practice the knowledge, then it’s all useless.  That’s my lesson from yesterday.

The Tao of Backup, which I linked to before, says:

backup testing

So, what happened?  Well, as I was preparing for the Fedora 24 installation, I wanted to backup some of my files, as the partition would be formatted.  I’ve connected an external USB drive with plenty of space and ZIP-archived a few of the vital directories on to it.

That was a very simple backup procedure and I saw the resulting files on the volume.  What else should I do, right?  Wrong!  I should have tested the restore.  I didn’t.

Most of the directories that I backed up were small – /etc, /opt, /root.  But my /home directory was about 20 GBs.  The external USB disk used the FAT-32 file system, which has a 4 GB file size limit.  So only the first 4 GBs of my /home folder were backed up.  Funny enough, those files were mostly browser cache and image thumbnails – stuff that should be ignored from backups.  The main two folders that I wanted – Desktop and .ssh were not part of the backup.  And I only realized that after the partition has been formatted.

So, yeah, I should have tested the backup.

P.S.: Gladly, I do have backups elsewhere, and most of my work is committed to GitHub/BitBucket anyways.

Fedora 24 : the day of 64-bit has come

I’ve been using 64-bit Linux distributions on the servers for a while now, but was reluctant to put one on my laptop.  I’ve tried a couple of times many years ago, and found that there were all sorts of weird issues.

Yesterday, with a little push from my brother, Google, and Slashdot, I’ve decided to give it another go.  64-bit Fedora 24 is now on my laptop and I am carefully exploring it.  So far, so good.

I guess I won’t have to worry about Year 2038 Problem after all.

Forcing Amazon Linux AMI compatibility with CentOS in Ansible

One of the things that makes Ansible so awesome is a huge collection of shared roles over at Ansible Galaxy.  These bring you best practices, flexible configurations and in general save hours and hours of hardcore swearing and hair pulling.

Each role usually supports multiple versions of multiple Linux distributions.  However, you’ll find that the majority of the supported distributions are Ubuntu, Debian, Red Hat Enterprise Linux, CentOS, and Fedora.  The rest aren’t as popular.

Which brings me to the point with Amazon Linux AMI.  Amazon Linux AMI is mostly compatible with CentOS, but it uses a different version approach, which means that most of those Ansible roles will ignore or complain about not supporting Amazon AMI.

Here is an example I came across yesterday from the dj-wasabi.zabbix-server role.  The template for the Yum repository uses ansible_os_major_version variable, which is expected to be similar to Red Hat / CentOS version number – 5, 6, 7, etc.  Amazon Linux AMI’s major version is reported as “NA” – not available.   That’s probably because Amazon Linux AMI versions are date-based – with the latest one being 2016.03.

[zabbix]
name=Zabbix Official Repository - $basearch
baseurl=http://repo.zabbix.com/zabbix/{{ zabbix_version }}/rhel/{{ ansible_distribution_major_version }}/$basearch/
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX

Officially, Amazon Linux AMI is not CentOS or Red Hat Enterprise Linux.  But if you don’t care about such little nuances, and you are brave enough to experiment and assume things, than you can make that role work, by simply setting the appropriate variables to the values that you want.

First, here is a standalone test.yml playbook to try things out:

- name: Test
  hosts: localhost
  pre_tasks:
  - set_fact: ansible_distribution_major_version=6
    when: ansible_distribution == "Amazon"
  tasks:
  - debug: msg={{ ansible_distribution_major_version }}

Let’s run it and look at the output:

$ ansible-playbook test.yml

PLAY [Test] *******************************************************************

GATHERING FACTS ***************************************************************
ok: [localhost]

TASK: [set_fact ansible_distribution_major_version=6] *************************
ok: [localhost]

TASK: [debug msg={{ ansible_distribution_major_version }}] ********************
ok: [localhost] => {
  "msg": "6"
}

PLAY RECAP ********************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=0

So far so good.  Now we need to integrate this into our playbook in such a way that the variable is set before the third-party role is executed.  For that, we’ll use pre_tasks.  Here is an example:

---
- name: Zabbix Server
  hosts: zabbix.server
  sudo: yes
    pre_tasks:
    - set_fact: ansible_distribution_major_version=6
      when: ansible_distribution == "Amazon" and ansible_distribution_major_version == "NA"
  roles:
    - role: dj-wasabi.zabbix-server

A minor twist here is also checking if the major version is not set yet. You can skip that, or you can change it, for example, to examine the Amazon Linux AMI version and set corresponding CentOS version.

Amazon Elastic File System

Here are some great news from the Amazon AWS blog – the announcement of the Elastic File System (EFS):

EFS lets you create POSIX-compliant file systems and attach them to one or more of your EC2 instances via NFS. The file system grows and shrinks as necessary (there’s no fixed upper limit and you can grow to petabyte scale) and you don’t pre-provision storage space or bandwidth. You pay only for the storage that you use.

EFS protects your data by storing copies of your files, directories, links, and metadata in multiple Availability Zones.

In order to provide the performance needed to support large file systems accessed by multiple clients simultaneously,Elastic File System performance scales with storage (I’ll say more about this later).

I think this might have been the most requested feature/service from Amazon AWS since EC2 launch.  Sure, one could have built an NFS file server before, but with the variety of storage options, availability zones, and the dynamic nature of the cloud setup itself, that was quite a challenge.  Now – all that and more in just a few clicks.

Thank you Amazon!

 

Let’s Encrypt on CentOS 7 and Amazon AMI

The last few weeks were super busy at work, so I accidentally let a few SSL certificates expire.  Renewing them is always annoying and time consuming, so I was pushing it until the last minute, and then some.

Instead of going the usual way for the renewal, I decided to try to the Let’s Encrypt deal.  (I’ve covered Let’s Encrypt before here and here.)  Basically, Let’s Encrypt is a new Certification Authority, created by Electronic Frontier Foundation (EFF), with the backing of Google, Cisco, Mozilla Foundation, and the like.  This new CA is issuing well recognized SSL certificates, for free.  Which is good.  But the best part is that they’ve setup the process to be as automated as possible.  All you need is to run a shell command to get the certificate and then another shell command in the crontab to renew the certificate automatically.  Certificates are only issued for 3 months, so you’d really want to have them automatically updated.

It took me longer than I expected to figure out how this whole thing works, but that’s because I’m not well versed in SSL, and because they have so many different options, suited for different web servers, and different sysadmin experience levels.

Eventually I made it work, and here is the complete process, so that I don’t have to figure it out again later.

We are running a mix of CentOS 7 and Amazon AMI servers, using both Nginx and Apache.   Here’s what I had to do.

First things first.  Install the Let’s Encrypt client software.  Supposedly there are several options, but I went for the official one.  Manual way:

# Install requirements
yum install git bc
cd /opt
git clone https://github.com/certbot/certbot letsencrypt

Alternatively, you can use geerlingguy’s lets-encrypt-role for Ansible.

Secondly, we need to get a new certificate.  As I said before, there are multiple options here.  I decided to use the certonly way, so that I have better control over where things go, and so that I would minimize the web server downtime.

There are a few things that you need to specify for the new SSL certificate.  These are:

  • The list of domains, which the certificate should cover.  I’ll use example.com and www.example.com here.
  • The path to the web folder of the site.  I’ll use /var/www/vhosts/example.com/
  • The email address, which Let’s Encrypt will use to contact you in case there is something urgent.  I’ll use ssl@example.com here.

Now, the command to get the SSL certificate is:

/opt/letsencrypt/certbot-auto certonly --webroot --email ssl@example.com --agree-tos -w /var/www/vhosts/example.com/ -d example.com -d www.example.com

When you run this for the first time, you’ll see that a bunch of additional RPM packages will be installed, for the virtual environment to be created and used.  On CentOS 7 this is sufficient.  On Amazon AMI, the command will run, install things, and will fail with something like this:

WARNING: Amazon Linux support is very experimental at present...
if you would like to work on improving it, please ensure you have backups
and then run this script again with the --debug flag!

This is useful, but insufficient.  Before you can run successfully, you’ll also need to do the following:

yum install python26-virtualenv

Once that is done, run the certbot command with the –debug parameter, like so:

/opt/letsencrypt/certbot-auto certonly --webroot --email ssl@example.com --agree-tos -w /var/www/vhosts/example.com/ -d example.com -d www.example.com --debug

This should produce a success message, with “Congratulations!” and all that.  The path to your certificate (somewhere in /etc/letsencrypt/live/example.com/) and its expiration date will be mentioned too.

If you didn’t get the success message, make sure that:

  • the domain, for which you are requesting a certificate, resolves back to the server, where you are running the certbot command.  Let’s Encrypt will try to access the site for verification purposes.
  • that public access is allowed to the /.well-known/ folder.  This is where Let’s Encrypt will store temporary verification files.  Note that the folder starts with dot, which in UNIX means hidden folder, which are often denied access to by many web server configurations.

Just drop a simple hello.txt to the /.well-known/ folder and see if you can access it with the browser.  If you can, then Let’s Encrypt shouldn’t have any issues getting you a certification.  If all else fails, RTFM.

Now that you have the certificate generated, you’ll need to add it to the web server’s virtual host configuration.  How exactly to do this varies from web server to web server, and even between the different versions of the same web server.

For Apache version >= 2.4.8 you’ll need to do the following:

SSLEngine on
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem

For Apache version < 2.4.8 you’ll need to do the following:

SSLEngine on
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLCertificateFile /etc/letsencrypt/live/example.com/cert.pem
SSLCertificateChainFile /etc/letsencrypt/live/example.com/chain.pem

For Nginx >= 1.3.7 you’ll need to do the following:

ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

You’ll obviously need the additional SSL configuration options for protocols, ciphers and the like, which I won’t go into here, but here are a few useful links:

Once your SSL certificate is issued and web server is configured to use it, all you need is to add an entry to the crontab to renew the certificates which are expiring in 30 days or less.  You’ll only need a single entry for all your certificates on this machine.  Edit your /etc/crontab file and add the following (adjust for your web server software, obviously):

# Renew Let's Encrypt certificates at 6pm every Sunday
0 18 * * 0 root (/opt/letsencrypt/certbot-auto renew && service httpd restart)

That’s about it.  Once all is up and running, verify and adjust your SSL configuration, using Qualys SSL Labs excellent tool.