httpoxy – a CGI application vulnerability for PHP, Go, Python and others

httpoxy

httpoxy is a set of vulnerabilities that affect application code running in CGI, or CGI-like environments.

It comes down to a simple namespace conflict:

  • RFC 3875 (CGI) puts the HTTP Proxy header from a request into the environment variables as HTTP_PROXY
  • HTTP_PROXY is a popular environment variable used to configure an outgoing proxy

This leads to a remotely exploitable vulnerability. If you’re running PHP or CGI, you should block the Proxy header now.

The RegEx that killed StackOverflow

Here’s an outage postmortem from the recent StackOverflow downtime.  It just shows you how easy it is to break things, even they were built by some of the smartest people around.  Programming is touch and there is no way around it.

Technical Details

The regular expression was: ^[\s\u200c]+|[\s\u200c]+$ Which is intended to trim unicode space from start and end of a line. A simplified version of the Regex that exposes the same issue would be \s+$ which to a human looks easy (“all the spaces at the end of the string”), but which means quite some work for a simple backtracking Regex engine. The malformed post contained roughly 20,000 consecutive characters of whitespace on a comment line that started with — play happy sound for player to enjoy. For us, the sound was not happy.

If the string to be matched against contains 20,000 space characters in a row, but not at the end, then the Regex engine will start at the first space, check that it belongs to the \s character class, move to the second space, make the same check, etc. After the 20,000th space, there is a different character, but the Regex engine expected a space or the end of the string. Realizing it cannot match like this it backtracks, and tries matching \s+$ starting from the second space, checking 19,999 characters. The match fails again, and it backtracks to start at the third space, etc.

So the Regex engine has to perform a “character belongs to a certain character class” check (plus some additional things) 20,000+19,999+19,998+…+3+2+1 = 199,990,000 times, and that takes a while. This is not classic catastrophic backtracking (talk on backtracking) (performance is O(n²), not exponential, in length), but it was enough. This regular expression has been replaced with a substring function.

Composer magic

Now that everyone is super comfortable with composer, I thought I’d share these two gems which I didn’t know or think about.

composer info

This command lists all of your packages installed with composer.  This is super handy if you want to include a page in your project, listing all the libraries and versions which are currently installed.  It also gives you a description of each library as provided by the package.

composer outdated

This command lists packages which you are using, which have updates available.  With this you can have a better understanding of what will happen if you run composer update (depending on your composer.json of course).

Update (July 21, 2016): Guess what? There is even a way to combine the two with one command: composer info -l .  This will list all the packages, with their versions and descriptions, and with an additional column of the latest version for each package.

10 Favorite Job Interview Questions for Linux System Administrators

As someone who interviews a lot of people (mostly for the web development positions though, not system administration), I’m always looking for more ideas on what to ask the candidates.  Today I came across “10 Favorite Job Interview Questions for Linux System Administrators“, which has a few of bits that I liked.

First of all, this GitHub repository is super awesomeness.  It also links to a few other resources with more questions and ideas.  Not only for sysadmin interviews.

Then, this one is funny, yet somewhat challenging:

2. Name and describe a different Linux/Unix command for each letter of the alphabet. But also, describe how a common flush toilet works.

It also checks that you know the alphabet.

9. Print the content of a file backwards.

“I like broad questions where each person could give a different answer depending on their depth of knowledge. My personal answer is 8 characters not including the filename.” – Marc Merlin, Google.

This one caught me by surprise.  My immediate thought was “tac some_file“, but that’s obviously not enough.  tac only prints the lines in reverse order.  Which is not the same as reversing the file.  Perl to the rescue, but I wonder what’s the most elegant way to do it without the scripting language.

As always, interview questions are not only useful for the interviews.  They are a good measure of your own knowledge gaps and habit pitfalls.  This time was no exception.

The History of the URL

The History of the URL is a brilliant compilation of ideas and resources, explaining how we got to the URLs we use and love (or hate) today.  In fact, the article comes in two parts:

  1. Domain, protocol, and port
  2. Path, fragment, query, and auth

Read them in whatever order you prefer. But I guarantee that you’ll have a number of different responses through out, from “Wow! I never knew that” and “I would have never thought of that!” to “No way! I don’t believe it“.

And here is one of the bits that made me smile:

In 1996 Keith Shafer, and several others proposed a solution to the problem of broken URLs. The link to this solution is now broken. Roy Fielding posted an implementation suggestion in July of 1995. The link is now broken.