Google and HTTPS

Here are some interesting news on the subject of Google and HTTPS:

In support of our work to implement HTTPS across all of our products (https://www.google.com/transparencyreport/https/) we have been operating our own subordinate Certificate Authority (GIAG2), issued by a third-party. This has been a key element enabling us to more rapidly handle the SSL/TLS certificate needs of Google products.

As we look forward to the evolution of both the web and our own products it is clear HTTPS will continue to be a foundational technology. This is why we have made the decision to expand our current Certificate Authority efforts to include the operation of our own Root Certificate Authority. To this end, we have established Google Trust Services (https://pki.goog/), the entity we will rely on to operate these Certificate Authorities on behalf of Google and Alphabet.

The process of embedding Root Certificates into products and waiting for the associated versions of those products to be broadly deployed can take time. For this reason we have also purchased two existing Root Certificate Authorities, GlobalSign R2 and R4. These Root Certificates will enable us to begin independent certificate issuance sooner rather than later.

We intend to continue the operation of our existing GIAG2 subordinate Certificate Authority.

If you need a bit of help putting this into perspective, this Hacker News thread has your back:

You can now have a website secured by a certificate issued by a Google CA, hosted on Google web infrastructure, with a domain registered using Google Domains, resolved using Google Public DNS, going over Google Fiber, in Google Chrome on a Google Chromebook. Google has officially vertically integrated the Internet.

Immutable Infrastructure with AWS and Ansible

Immutable infrastructure is a very powerful concept that brings stability, efficiency, and fidelity to your applications through automation and the use of successful patterns from programming.  The general idea is that you never make changes to running infrastructure.  Instead, you ensure that all infrastructure is created through automation, and to make a change, you simply create a new version of the infrastructure, and destroy the old one.

“Immutable Infrastructure with AWS and Ansible” is a, so far, three part article series (part 1, part 2, part 3), that shows how to use Ansible to achieve an immutable infrastructure on the Amazon Web Services cloud solution.

It covers everything starting from the basic setup of the workstation to execute Ansible playbooks and all the way to AWS security (users, roles, security groups), deployment of resources, and auto-scaling.

10,000 most common English words

This GitHub repository contains a list of the 10,000 most common English words, sorted by frequency, as seen by the Google Machine Translation Team.

Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google’s datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there’s no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more – resulting in a training corpus of one trillion words from public Web pages.

We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That’s why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.

There are a few variations of the list – with and without the swear words and such.  I took a quick look at it and was surprised to find that “cyprus” is at position 4,993 (pretty high), immediately after the word “emails“.  Weird!

(found via the link from this article)

Google Infrastructure Security Design Overview

If you ever wanted to know what Google does to maintain its high level of security, here’s your chance. Google Infrastructure Security Design Overview provides quite a bit of information on the subject.

This document gives an overview of how security is designed into Google’s technical infrastructure. This global scale infrastructure is designed to provide security through the entire information processing lifecycle at Google. This infrastructure provides secure deployment of services, secure storage of data with end user privacy safeguards, secure communications between services, secure and private communication with customers over the internet, and safe operation by administrators.

Amazon Linux AMI : Let’s Encrypt : ImportError: No module named interface

Let’s Encrypt has only experimental support for the Amazon Linux AMI, so it’s kind of expected to have issues once in a while.   Here’s one I came across today:

# /opt/letsencrypt/certbot-auto renew
Creating virtual environment...
Installing Python packages...
Installation succeeded.
Traceback (most recent call last):
File "/root/.local/share/letsencrypt/bin/letsencrypt", line 7, in <module>
from certbot.main import main
File "/root/.local/share/letsencrypt/local/lib/python2.7/dist-packages/certbot/main.py", line 12, in <module>
import zope.component
File "/root/.local/share/letsencrypt/local/lib/python2.7/dist-packages/zope/component/__init__.py", line 16, in <module>
from zope.interface import Interface
ImportError: No module named interface

My first though was to install the system updates. It looks like something is off in the Python-land. But even after the “yum update” was done, the issue was still there. A quick Google search later, thanks to the this GitHub issue and this comment, the solution is the following:

pip install pip --upgrade
pip install virtualenv --upgrade
virtualenv -p /usr/bin/python27 venv27

Running the renewal of the certificates works as expected after this.

P.S.: I wish we had fewer package and dependency managers in the world…

Amazon RDS and Amazon Virtual Private Cloud (VPC)

Yesterday I helped a friend to figure out why he couldn’t connect to his Amazon RDS database inside the Amazon VPC (Virtual Private Cloud).  It was the second time someone asked me to help with the Amazon Web Services (AWS), and it was the first time I was actually helpful.  Yey!

While I do use quite a few of the Amazon Web Services, I don’t have any experience with the Amazon RDS yet, as I’m managing my own MySQL instances.  It was interesting to get my toes wet in the troubleshooting.

Here are a few things I’ve learned in the process.

Lesson #1: Amazon supports two different ways of accessing the RDS service.  Make sure you know which one you are using and act accordingly.

gs-vpc-network

If you run an Amazon RDS instance in the VPC, you’ll have to setup your networking and security access properly.  This page – Connecting to a DB Instance Running the MySQL Database Engine – will only be useful once everything else is taken care of.  It’s not your first and only manual to visit.

Lesson #2 (sort of obvious): Make sure that both your Network ACL and Security Groups allow all the necessary traffic in and out.  Double-check the IP addresses in the rules.  Make sure you are not using a proxy server, when looking up your external IP address on WhatIsMyIP.com or similar.

Lesson #3: Do not use ICMP traffic (ping and such) as a troubleshooting tool.  It looks like Amazon RDS won’t be ping-able even if you allow it in your firewalls.  Try with “telnet your-rds-end-point-server your-rds-end-point-port” (example: “telnet 1.2.3.4 3306” or with a real database client, like the command-line MySQL one.

Lesson #4: Make sure your routing is setup properly.  Check that the subnet in which your RDS instance resides has the correct routing table attached to it, and that the routing table has the default gateway (0.0.0.0/0) route configured to either the Internet Gateway or to some sort of NAT.  Chances are your subnet is only dealing with private IP range and has no way of sending traffic outside.

Lesson #5: When confused, disoriented, and stuck, assume it’s not Amazon’s fault.  Keep calm and troubleshoot like any other remote connection issue.  Double-check your assumptions.

There’s probably lesson 6 somewhere there, about contacting support or something along those lines.  But in this particular case it didn’t get to that.  Amazon AWS support is excellent though.  I had to deal with those guys twice in the last two-something years, and they were awesome.

PHP : Microsoft Office 365 and Active Directory

Disclaimer: I am not the biggest fan of Microsoft.  On the contrary.  I keep running into situations, where Microsoft technologies are a constant source of pain.  If that annoys you, please stop reading this post now and go away.  I don’t care.  You’ve been warned.

A few recent projects that I’ve been working on in the office required integration with Microsoft Office 365.  Office 365 is a new kid on the block as far as I am concerned, so I had no experience of integrating with these services.

The first look at what needs to be done resulted in a heavy drinking session and a mild depression.  Here are a few links to get you started on that path, if you are interested:

We’ve discussed the options with the client and decided to go a different route – limit the integration to the single sign-on (SSO) only, and use their Active Directory server (I’m not sure about the exact setup on the client side, but I think they use Active Directory Federation Services to have a local server in the office synchronized with the Office 365 directory).

Exposing the Active Directory server to the entire Internet is not the smartest idea, so we had to wrap this all into a virtual private network (VPN).  You can read my blog post on how to setup the CentOS 7 server as an automated VPN client.

Once the Active Directory was established, PHP LDAP module was very useful for avoiding any low-level programming (sockets and such).  With a bit of Google searching and StackOverflow reading, we managed to figure out the magic combination of parameters for ldap_connect(), ldap_set_option(), and ldap_search().

It took longer than expected, but some of it was due to the non-standard configuration and permissions on the client side.  Anyways, it worked, which were the good news.

The client accepted the implementation and we could just close the chapter, have another drink, and forget about this nightmare.  But something was bothering me about it, so I was thinking the heavy thoughts at the back of my mind.

The things that bother me about this implementation are the following:

  • Although it works, it’s a rather raw implementation, with very limited flexibility (filters, multiple servers, etc).
  • The code is difficult to test, due to the specifics of the AD setup and the network access limitations.
  • There is a lack of elegance to the solution.  Working code is good, but I like things to be beautiful too.  As much as possible at least.

So, I was keeping an eye open and I think today I came across a couple of links that can help make things better:

  1. adLDAP PHP library, which provides LDAP authentication and integration with Active Directory.  I don’t know how I missed it so far, but I think now things will be much easier and cleaner.
  2. Search Filter Syntax documentation on MSDN.
  3. This Reddit thread.  Yes, a lot of the things I’ve learned today are linked from it.  But it’ll be much easier for me to find all this information in my own blog, next time I’ll have to deal with Microsoft again.
  4. Public-facing LDAP server thanks to Georgia Institute of Technology, for testing connection and simple queries.

Armed with this new knowledge, I’m sure the current working solution can be improved a lot – simplified with fewer lines of code, based on the much more robust and tested code base, and given a basic test script to make sure the code works somewhere else, outside of a particular client’s setup.

I wish I came across that all much earlier.

 

Automate OpenVPN client on CentOS 7

I need to setup OpenVPN client to start automatically on a CentOS 7 server for one of our recent projects at work.  I’m not well versed in VPN technology, but the majority of the time was spent on something that I didn’t expect.

I go the VPN configuration and all the necessary certificates from the client, installed OpenVPN and tried it out.  It seemed to work just fine.  But the setting it up to start automatically and without any human intervention took much longer than I though it would.

The first issue that I came across was the necessary input of username and password for the VPN connection to be established.  The solution to that is simple (thanks to this comment):

  1. Create a new text file (for example, /etc/openvpn/auth) with the username being the first line of the file, and the password being the second.  Don’t forget to limit the permissions to read-only by root.
  2. Add the following line to the VPN configuration file (assuming /etc/openvpn/client.conf): “auth-user-pass auth“.  Here, the second “auth” is the name of the file, relative to the VPN configuration.

With that, the manual startup of the VPN (openvpn client.conf) was working.

Now, how do we start the service automatically?  The old-school knowledge was suggesting “service openvpn start”.  But that fails due to openvpn being an uknown service.  Weird, right?

“rpm -ql openvpn” pointed to the direction of the systemd service (“systemctl start openvpn”).  But that failed too.  The name of the service was strangely looking too:

# rpm -ql openvpn | grep service
/usr/lib/systemd/system/openvpn@.service

A little (well, not that little after all) digging around, revealed something that I didn’t know.  Systemd services can be started with different configuration files.  In this case, you can run “systemctl start openvpn@foobar” to start the OpenVPN service using “foobar” configuration file, which should be in “/etc/openvpn/foobar.conf“.

What’s that config file and where do I get it from?  Well, the OpenVPN configuration sent from our client had a “account@host.ovpn” file, which is exactly what’s needed.  So, renaming “account@host.ovpn” to “client.conf” and moving it together with all the other certificate files into “/etc/openvpn” folder allowed me to do “systemctl start openvpn@client“.  All you need now is to make the service start automatically at boot time and you are done.

base32 advantages over base64

Andrey shares some of the advantages of base32 over base64 encoding:

  1. The resulting character set is all one case, which can often be beneficial when using a case-insensitive filesystem, spoken language, or human memory.
  2. The result can be used as a file name because it can not possibly contain the ‘/’ symbol, which is the Unix path separator.
  3. The alphabet can be selected to avoid similar-looking pairs of different symbols, so the strings can be accurately transcribed by hand. (For example, the RFC 4648 symbol set omits the digits for one, eight and zero, since they could be confused with the letters ‘I’, ‘B’, and ‘O’.)
  4. A result excluding padding can be included in a URL without encoding any characters.

Personally, I don’t think I’ve heard about base32 until today.

Yet another bit on security

Here are a couple of interesting articles from the last few days on Slashdot.

First, comes in a very non-surprising survey saying that “40 percent of organizations store admin passwords in Word documents“.  Judging from my personal experiences in different companies, I’d say this number is much higher if you extend the Word documents to Excel spreadsheets and plain text files.  I think pretty much every single company I’ve worked at used such common files for admin password storage (at least at some point).

“Why or why?!!!”, the security concerned among you might scream.  Well, I think there are two reasons for this.  The first one is that password management is complicated.  There are tools that help with this, but even those are rarely easy to use.  Storing the passwords in a secure, encrypted storage is one thing.  But, how do you share them with just the right people? How do you trust the tool? What happens if the file gets corrupted, the software updates, the license expires, or the master password is lost?  The risk of losing admin access to all your equipment and accounts is scary.  On top of that, there is the issue of changing passwords (especially when people leave the company) – not a simple job if you have a variety of accounts (hardware, software, services, etc) and a lot of people who have a varying degree of access.  Or automation scripts that need access to perform large scale operations.  Personally, I don’t think this problem has been solved yet.

The second reason is in this other Slashdot post – “Sad Reality: It’s Cheaper To Get Hacked Than Build Strong IT Defenses“.  This is very true as well.  A simple firewall and a strong password policy is often more than enough for many organizations.  The risks of compromise are low.  In those cases where it does happen, you’d often get some script kiddie consequence like a Bitcoin mining app or affiliate links spread across your website.  Both are quite easy to detect and fix.  Is it worth investing hundreds of thousands in equipment and personnel to prevent this? For many companies it is not.

The fact of the matter is that a lot of people don’t really care about security or privacy on the personal level, and that then translates into the organizational mentality as well.

Just think about people leaving in all those high crime areas.  Some of them think the risk is worth it – maybe then can make more money there or have a more exciting life.  Some of them simply can’t afford to move anywhere.  That’s very similar to the digital security, I think.  Some don’t care and prefer to run the risk, saving the money on protection. Some simply can’t afford to have a decent level of security.