The RegEx that killed StackOverflow

Here’s an outage postmortem from the recent StackOverflow downtime.  It just shows you how easy it is to break things, even they were built by some of the smartest people around.  Programming is touch and there is no way around it.

Technical Details

The regular expression was: ^[\s\u200c]+|[\s\u200c]+$ Which is intended to trim unicode space from start and end of a line. A simplified version of the Regex that exposes the same issue would be \s+$ which to a human looks easy (“all the spaces at the end of the string”), but which means quite some work for a simple backtracking Regex engine. The malformed post contained roughly 20,000 consecutive characters of whitespace on a comment line that started with — play happy sound for player to enjoy. For us, the sound was not happy.

If the string to be matched against contains 20,000 space characters in a row, but not at the end, then the Regex engine will start at the first space, check that it belongs to the \s character class, move to the second space, make the same check, etc. After the 20,000th space, there is a different character, but the Regex engine expected a space or the end of the string. Realizing it cannot match like this it backtracks, and tries matching \s+$ starting from the second space, checking 19,999 characters. The match fails again, and it backtracks to start at the third space, etc.

So the Regex engine has to perform a “character belongs to a certain character class” check (plus some additional things) 20,000+19,999+19,998+…+3+2+1 = 199,990,000 times, and that takes a while. This is not classic catastrophic backtracking (talk on backtracking) (performance is O(n²), not exponential, in length), but it was enough. This regular expression has been replaced with a substring function.

Composer magic

Now that everyone is super comfortable with composer, I thought I’d share these two gems which I didn’t know or think about.

composer info

This command lists all of your packages installed with composer.  This is super handy if you want to include a page in your project, listing all the libraries and versions which are currently installed.  It also gives you a description of each library as provided by the package.

composer outdated

This command lists packages which you are using, which have updates available.  With this you can have a better understanding of what will happen if you run composer update (depending on your composer.json of course).

Update (July 21, 2016): Guess what? There is even a way to combine the two with one command: composer info -l .  This will list all the packages, with their versions and descriptions, and with an additional column of the latest version for each package.

10 Favorite Job Interview Questions for Linux System Administrators

As someone who interviews a lot of people (mostly for the web development positions though, not system administration), I’m always looking for more ideas on what to ask the candidates.  Today I came across “10 Favorite Job Interview Questions for Linux System Administrators“, which has a few of bits that I liked.

First of all, this GitHub repository is super awesomeness.  It also links to a few other resources with more questions and ideas.  Not only for sysadmin interviews.

Then, this one is funny, yet somewhat challenging:

2. Name and describe a different Linux/Unix command for each letter of the alphabet. But also, describe how a common flush toilet works.

It also checks that you know the alphabet.

9. Print the content of a file backwards.

“I like broad questions where each person could give a different answer depending on their depth of knowledge. My personal answer is 8 characters not including the filename.” – Marc Merlin, Google.

This one caught me by surprise.  My immediate thought was “tac some_file“, but that’s obviously not enough.  tac only prints the lines in reverse order.  Which is not the same as reversing the file.  Perl to the rescue, but I wonder what’s the most elegant way to do it without the scripting language.

As always, interview questions are not only useful for the interviews.  They are a good measure of your own knowledge gaps and habit pitfalls.  This time was no exception.

The History of the URL

The History of the URL is a brilliant compilation of ideas and resources, explaining how we got to the URLs we use and love (or hate) today.  In fact, the article comes in two parts:

  1. Domain, protocol, and port
  2. Path, fragment, query, and auth

Read them in whatever order you prefer. But I guarantee that you’ll have a number of different responses through out, from “Wow! I never knew that” and “I would have never thought of that!” to “No way! I don’t believe it“.

And here is one of the bits that made me smile:

In 1996 Keith Shafer, and several others proposed a solution to the problem of broken URLs. The link to this solution is now broken. Roy Fielding posted an implementation suggestion in July of 1995. The link is now broken.

After a year of using NodeJS in production

There are days, when I feel jealous of all the young kids playing around with new technologies.  I need a certain level of stability and acceptance of the technology before I can apply it to client projects.  And I need time, which is a very scarce resource lately.

And yet there are days, when I feel good about being somewhat reserved and conservative in my technology stack choices.  Reading this blog post makes me feel just that.  Of course I need to try it out for myself and shape my own opinion, but with my lack of time, this should do.

I spent a year trying to make Javascript and more specifically Node work for our team. Unfortunately during that time we spent more hours chasing docs, coming up with standards, arguing about libraries and debugging trivial code more than anything.

Would I recommend it for large-scale products? Absolutely not. Do people do that anyway? Of course they do. I tried to.

I would however recommend Javascript for front-end development such as Angular or React (like you have another choice).

I would also recommend Node for simple back-end servers mainly used for websockets or API relay.

Now if only somebody wrote a similar post about Docker …

Wikiwand – Wikipedia Modernized

I came across an interesting take on WikipediaWikiwand.  It’s basically an upgraded and modernized design of the Wikipedia.  You can either search and browse it like you do with the regular Wikipedia, or, better even, install a browser extension (here’s one for Google Chrome), which will redirect all your Wikipedia page clicks through to Wikiwand.  You get exactly the same content, but now it’s actually quite pleasant to explore.  Have a look at Cyprus page, for example:

wikiwand

I’m not a frequent Wikipedia reader, but in the last couple of days, I have to say, I’ve found myself spending much more time than usual reading Wikipedia pages on the Wikiwand website.  Maybe, it is time for the Wikipedia face lift after all.

But it’s not just about forcing a different web design upon thee.  There’s more.  You get options (upper-right corner).  You can switch between light and dark designs, sans and serif fonts, adjust font size and text justification, and more. If you create account and login (Facebook is supported), you can bookmark pages too.

options

Even if you are not a fan of fancy websites, I suggest you give it a try for a couple of days.  You might find yourself quite surprised.

The Slashdot Interview With Larry Wall

Slashdot runs the interview with Larry Wall, the creator of Perl programming language.  There is a wide variety of questions.  Some are technical – about Perl 6, comparison to other programming languages (Python, PHP), Perl in the browser, etc.  Some are more generic – what kind of tools Larry uses, and what are his thoughts on English being lingua franca of the computer world.  The answers are often funny, yet very insightful.

Test your backups!

You can read all the books in the world and know all there is to know, but if you don’t follow the wisdom and practice the knowledge, then it’s all useless.  That’s my lesson from yesterday.

The Tao of Backup, which I linked to before, says:

backup testing

So, what happened?  Well, as I was preparing for the Fedora 24 installation, I wanted to backup some of my files, as the partition would be formatted.  I’ve connected an external USB drive with plenty of space and ZIP-archived a few of the vital directories on to it.

That was a very simple backup procedure and I saw the resulting files on the volume.  What else should I do, right?  Wrong!  I should have tested the restore.  I didn’t.

Most of the directories that I backed up were small – /etc, /opt, /root.  But my /home directory was about 20 GBs.  The external USB disk used the FAT-32 file system, which has a 4 GB file size limit.  So only the first 4 GBs of my /home folder were backed up.  Funny enough, those files were mostly browser cache and image thumbnails – stuff that should be ignored from backups.  The main two folders that I wanted – Desktop and .ssh were not part of the backup.  And I only realized that after the partition has been formatted.

So, yeah, I should have tested the backup.

P.S.: Gladly, I do have backups elsewhere, and most of my work is committed to GitHub/BitBucket anyways.

Fedora 24 : the day of 64-bit has come

I’ve been using 64-bit Linux distributions on the servers for a while now, but was reluctant to put one on my laptop.  I’ve tried a couple of times many years ago, and found that there were all sorts of weird issues.

Yesterday, with a little push from my brother, Google, and Slashdot, I’ve decided to give it another go.  64-bit Fedora 24 is now on my laptop and I am carefully exploring it.  So far, so good.

I guess I won’t have to worry about Year 2038 Problem after all.

Election. Yeah, right!

This Google blog post titled “A voice for everyone in 2016” made me chuckle:

Every election matters and every vote counts. The American democracy relies on everyone’s participation in the political process. This November, Americans all across the country will line up at the polls to cast their ballots for the President of the United States.

It sounds like a true effort to make things better and enhance democracy and what not.  But in practice, is it really an election? One by one, the candidates are falling of the ballot.  Day by day it becomes more obvious that Hillary Clinton will be the next president of the USA.

The more tools and technologies we have to enhance our lives, the worst the content on which we can apply those tools becomes.  The better the home cinemas became, the worse the movies got.  The better audio systems we have, the worse the music gets.  And politics just follow the same trend, unfortunately.