HAProxy SNI

HAProxy SNI” is pure gold! If you want to have a load balancer for HTTPS traffic, without managing SSL certificates on the said load balancer, there is a way to do so.

The approach is utilizing the Server Name Indication (SNI) extension to the TLS protocol.  I knew about it and I was already using it on the web server side, but it didn’t occur to me that it’ll be utilized on the load balancer.  Here’s the configuration bit:

frontend https *:443
  description Incoming traffic to port 443
  mode tcp
  tcp-request inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }
  use_backend backend-ssl-foobar if { req_ssl_sni -i foobar.com }
  use_backend backend-ssl-example if { req_ssl_sni -i example.com }
  default_backend backend-ssl-default

The above will make HAProxy listen on port 443, and then send all traffic for foobar.com to one backend, all traffic for example.com to another backend, and the rest to the third, default backend.

Docker Image Vulnerability Research

Federacy has an interesting research in Docker image vulnerabilities.  The bottom line is:

24% of latest Docker images have significant vulnerabilities

This can and should be improved, especially given the whole hierarchical structure of Docker images.  It’s not like improving security of all those random GitHub repositories.

World’s Biggest Data Breaches

Here’s an interactive collection of the world’s biggest data breaches.  It goes back to 2004, where about 92,000,000 email addresses and screen names were stolen by an AOL employee, and covers most of the major events up until and including 2016.  There are a few ways to filter the data and change the representation.

Overall, should give you a pretty good idea of how safe and secure your online data is. Oh, and how private it is too.

AbuseIO – Open Source abuse management

AbuseIO is an Open Source software for management of abuse reports.  It’s like a specialized ticketing/support system, which can automatically parse a variety of abuse notifications, file them, notify the team, and provide the tools to respond and close the incident.  In a nutshell:

 

  • 100% Free & Open Source
  • Works with IPv4 and IPv6 addresses
  • Automatically parse events into abuse tickets and add a classification
  • Integrate with existing IPAM systems
  • Set automatic (re)notifications per case or customer with configurable intervals
  • Allow abuse desks and end users to reply, close or add notes to cases
  • Link end users to a self help portal in case they need help to resolve the issue

If that sounds interesting, have a look at the Features page.  You might also want to read the blog post covering a last year’s release of AbuseIO version 4.0.

The system is written in PHP, with Laravel framework, so making changes and adding features should be quite easy.

 

SELinux Concepts – but for humans

SELinux has been an annoyance for me since the early days of Fedora and Red Hat bringing it into the distribution and enabling by default (see this blog post, for example, from 2004 about Fedora 3).

Over the years, I’ve tried to learn it, make it useful, and find benefits in using it, but somehow those were never enough and I keep falling back on the disabling it.  But on the other hand, my understanding of how SELinux works slowly is growing.  The video in this blog post helped a lot.

And now I’m glad to add another useful resource to the “SELinux for mere mortals” collection.  The blog mostly focuses on the terminology in the SELinux domain, and what means what.  It’s so simple and straight-forward, that it even uses examples of HTML and CSS – something I’ve never seen before.   If you are making your way through the “how the heck do I make sense of SELinux” land, check it out.  I’m sure it’ll help.

Fixing outdated Let’s Encrypt (zope.interface error)

I’ve started using Let’s Encrypt for the SSL certificates a while back.  I installed it on all the web servers, irrelevant of the need for SSL, just to have it there, when I need it (thanks to this Ansible role).  One of those old web servers needed an SSL certificate recently, so I thought it’d be no problem to generate one.

But I was wrong. The letsencrypt-auto tool got outdated and was failing to execute, throwing some Python exception about missing zope.interface module.  A quick Google search brought this StackOverflow discussion, with the exact issue I was having.

Traceback (most recent call last):
  File "/root/.local/share/letsencrypt/bin/letsencrypt", line 7, in <module>
    from certbot.main import main
  File "/root/.local/share/letsencrypt/local/lib/python2.7/dist-packages/certbot/main.py", line 12, in <module>
    import zope.component
  File "/root/.local/share/letsencrypt/local/lib/python2.7/dist-packages/zope/component/__init__.py", line 16, in <module>
    from zope.interface import Interface
ImportError: No module named interface

However, the solution didn’t fix the problem for me:

unset PYTHON_INSTALL_LAYOUT
/opt/letsencrypt/letsencrypt-auto -v

Even pulling the updated version from the GitHub repository didn’t solve it.

After poking around for a while more, I found this bug report from the last year, which solved my problem.

I recommend:

  1. Running rm -rf /root/.local/share/letsencrypt. This removes your installation of letsencrypt, but keeps all configuration files, certificates, logs, etc.
  2. Make sure you have an up to date copy of letsencrypt-auto. It can be found here.
  3. Run letsencrypt-auto again.

If you get the same behavior, you can try installing zope.interface manually by running:

/root/.local/share/letsencrypt/bin/pip install zope.interface

Hopefully, next time I’ll remember to search my blog’s archives …

Dissecting an SSL certificate

Julia Evans does it again.  If you ever wanted to understand SSL certificates, her post “Dissecting an SSL certificate” is for you.   This part made me smile:

Picking the right settings for your SSL certificates and SSL configuration on your webserver is confusing. As far as I understand it there are about 3 billion settings. Here is an example of an SSL Labs result for mail.google.com. There is all this stuff like OLD_TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 on that page (for real, that is a real thing.). I’m happy there are tools like SSL Labs that help mortals make sense of all of it.

Google and HTTPS

Here are some interesting news on the subject of Google and HTTPS:

In support of our work to implement HTTPS across all of our products (https://www.google.com/transparencyreport/https/) we have been operating our own subordinate Certificate Authority (GIAG2), issued by a third-party. This has been a key element enabling us to more rapidly handle the SSL/TLS certificate needs of Google products.

As we look forward to the evolution of both the web and our own products it is clear HTTPS will continue to be a foundational technology. This is why we have made the decision to expand our current Certificate Authority efforts to include the operation of our own Root Certificate Authority. To this end, we have established Google Trust Services (https://pki.goog/), the entity we will rely on to operate these Certificate Authorities on behalf of Google and Alphabet.

The process of embedding Root Certificates into products and waiting for the associated versions of those products to be broadly deployed can take time. For this reason we have also purchased two existing Root Certificate Authorities, GlobalSign R2 and R4. These Root Certificates will enable us to begin independent certificate issuance sooner rather than later.

We intend to continue the operation of our existing GIAG2 subordinate Certificate Authority.

If you need a bit of help putting this into perspective, this Hacker News thread has your back:

You can now have a website secured by a certificate issued by a Google CA, hosted on Google web infrastructure, with a domain registered using Google Domains, resolved using Google Public DNS, going over Google Fiber, in Google Chrome on a Google Chromebook. Google has officially vertically integrated the Internet.

Immutable Infrastructure with AWS and Ansible

Immutable infrastructure is a very powerful concept that brings stability, efficiency, and fidelity to your applications through automation and the use of successful patterns from programming.  The general idea is that you never make changes to running infrastructure.  Instead, you ensure that all infrastructure is created through automation, and to make a change, you simply create a new version of the infrastructure, and destroy the old one.

“Immutable Infrastructure with AWS and Ansible” is a, so far, three part article series (part 1, part 2, part 3), that shows how to use Ansible to achieve an immutable infrastructure on the Amazon Web Services cloud solution.

It covers everything starting from the basic setup of the workstation to execute Ansible playbooks and all the way to AWS security (users, roles, security groups), deployment of resources, and auto-scaling.

10,000 most common English words

This GitHub repository contains a list of the 10,000 most common English words, sorted by frequency, as seen by the Google Machine Translation Team.

Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google’s datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there’s no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more – resulting in a training corpus of one trillion words from public Web pages.

We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That’s why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.

There are a few variations of the list – with and without the swear words and such.  I took a quick look at it and was surprised to find that “cyprus” is at position 4,993 (pretty high), immediately after the word “emails“.  Weird!

(found via the link from this article)