The RegEx that killed StackOverflow

Here’s an outage postmortem from the recent StackOverflow downtime.  It just shows you how easy it is to break things, even they were built by some of the smartest people around.  Programming is touch and there is no way around it.

Technical Details

The regular expression was: ^[\s\u200c]+|[\s\u200c]+$ Which is intended to trim unicode space from start and end of a line. A simplified version of the Regex that exposes the same issue would be \s+$ which to a human looks easy (“all the spaces at the end of the string”), but which means quite some work for a simple backtracking Regex engine. The malformed post contained roughly 20,000 consecutive characters of whitespace on a comment line that started with — play happy sound for player to enjoy. For us, the sound was not happy.

If the string to be matched against contains 20,000 space characters in a row, but not at the end, then the Regex engine will start at the first space, check that it belongs to the \s character class, move to the second space, make the same check, etc. After the 20,000th space, there is a different character, but the Regex engine expected a space or the end of the string. Realizing it cannot match like this it backtracks, and tries matching \s+$ starting from the second space, checking 19,999 characters. The match fails again, and it backtracks to start at the third space, etc.

So the Regex engine has to perform a “character belongs to a certain character class” check (plus some additional things) 20,000+19,999+19,998+…+3+2+1 = 199,990,000 times, and that takes a while. This is not classic catastrophic backtracking (talk on backtracking) (performance is O(n²), not exponential, in length), but it was enough. This regular expression has been replaced with a substring function.

Composer magic

Now that everyone is super comfortable with composer, I thought I’d share these two gems which I didn’t know or think about.

composer info

This command lists all of your packages installed with composer.  This is super handy if you want to include a page in your project, listing all the libraries and versions which are currently installed.  It also gives you a description of each library as provided by the package.

composer outdated

This command lists packages which you are using, which have updates available.  With this you can have a better understanding of what will happen if you run composer update (depending on your composer.json of course).

Update (July 21, 2016): Guess what? There is even a way to combine the two with one command: composer info -l .  This will list all the packages, with their versions and descriptions, and with an additional column of the latest version for each package.

After a year of using NodeJS in production

There are days, when I feel jealous of all the young kids playing around with new technologies.  I need a certain level of stability and acceptance of the technology before I can apply it to client projects.  And I need time, which is a very scarce resource lately.

And yet there are days, when I feel good about being somewhat reserved and conservative in my technology stack choices.  Reading this blog post makes me feel just that.  Of course I need to try it out for myself and shape my own opinion, but with my lack of time, this should do.

I spent a year trying to make Javascript and more specifically Node work for our team. Unfortunately during that time we spent more hours chasing docs, coming up with standards, arguing about libraries and debugging trivial code more than anything.

Would I recommend it for large-scale products? Absolutely not. Do people do that anyway? Of course they do. I tried to.

I would however recommend Javascript for front-end development such as Angular or React (like you have another choice).

I would also recommend Node for simple back-end servers mainly used for websockets or API relay.

Now if only somebody wrote a similar post about Docker …

The Slashdot Interview With Larry Wall

Slashdot runs the interview with Larry Wall, the creator of Perl programming language.  There is a wide variety of questions.  Some are technical – about Perl 6, comparison to other programming languages (Python, PHP), Perl in the browser, etc.  Some are more generic – what kind of tools Larry uses, and what are his thoughts on English being lingua franca of the computer world.  The answers are often funny, yet very insightful.

Web Development With Assembly

The other day I was joking with a colleague of mine about how much fun it would be to do the web development in Assembly.  All the usual stuff – pages would be super fast, and the whole subject makes it for some fun interview material, as the candidates mention Assembly pretty much on every CV.

WebDev with Assembly

And then I decided to do a quick Google search.  To my (not so great) surprise I got to hilarious this Reddit thread, which, among other things, links to MiniMagAsm, a web development framework written in Assembly.  It compiles into a native binary and can be executed as a CGI script.

I’m not going to use it any time soon, but I think it’s super cool, and way more than a simple “hello world” page that I was expecting to find.

SugarCRM cache directory – it is NOT a cache directory!

Here is a useful reminder from a few years back – “SugarCRM cache directory – it is NOT a cache directory!“.   Unlike most modern day web applications, which use cache/ folder for temporary files, which are safe to delete, SugarCRM keeps a bunch of stuff in there, which, if disappeared, would leave you in a very uncomfortable and confused stay.

Things have obviously improved over the years, but it’s still far from perfect.  And while we are on the subject of surprising issues with SugarCRM, make sure check my other post about working with encrypted values.  Basically, the summary is: backup, backup, backup!  If you want to sleep well at night, backup SugarCRM’s full file system (files, configurations, temporary files, caches, etc) and its database.  And never ever change anything.

On test strings

I’ve seen my fair share of test strings, varying from simple ‘test’, ‘foo’, and ‘blah’ to automatically re-generated Lorem Ipsum paragraphs.  But I don’t really remember seeing anything more weird than this one:

$string = "I am not a question. How was your day? Sex On Hard Concrete Always Hurts The Orgasmic Area. Why does custard taste so lumpy when you use breast milk?";

From this StackOverflow answer.  Is there a tool that does this?  I wouldn’t mind using it in my daily work.