The world of PHP nightmare

I had a dream today. In fact, it was a nightmare that woke me up at 3am and kept me up for the next three hours or so. And I tell you honestly – this kind of things don’t happen to me all that often. In fact, I don’t even remember when was the last time I had anything similar.

I dreamed that the whole world is somehow written in PHP. A few bits were alright, but it mostly sucked. There were constant ground tremors.  Buildings were shaking in the slow soft waving motions. Things that were supposed to be soft were plastic hard. Things that were supposed to be hard were bumpy soft. Road tarmac felt like a gentle green grass field.

At some point of those tremors opened a long,  deep crack in the ground. The resulting vibration tore a nearby skyscraper in half, like it was a wet baguette, and the top part of the building slowly fell and disappeared in that crack (hi,  dr. Fraud). That was rather unpleasant to watch.

After a few scenes of apocalypse, the nightmare movie was cut to action, where I was a part of the task force that was supposed to fix the world. And, I tell you, we tried hard. We’ve refactored parts of the code,  migrated a few most critical systems to CakePHP, upgraded PHP to 5.6 and even tried all those high performance tricks from Facebook (hi, Hack). Things were getting better but not nearly enough. The world was still awkward, unstable and slow.

PHP wasn’t the only thing we were looking at. There was a lot work around databases and tuning servers. We’ve tried every profiling, monitoring and analytics tool we could get our hands on. But, to no avail.

The really horrifying part of the nightmare was when we finally realized that PHP won’t cut it and we’ll have to rewrite parts of the world in C.  We also somehow were missing a C compiler. I bet you can guess the epicenter of the nightmare now. Yes, indeed. We started writing a C compiler in PHP. That’s when I woke up in cold sweat, screaming “Noooooo!” through my lungs. That was more than I could bear.

For three hours after I tried not to Google or think if that was at all possible. Apparently, I love the world the way it is now – screwed up in a billion ways, but NOT written in PHP. With that peaceful thought and a beautiful sunrise I fell asleep.

Why are there different representations for newlines in Windows, Linux, and Mac?

Why are there different representations for newlines in Windows, Linux, and Mac?

This is a good question albeit one with a boring answer. Different systems evolved different encodings for newlines in the same way they evolved different behavior for myriad other things: Each system had to standardize on something and interoperability in the days before email let alone the Internet was unimportant.

There are several ways to represent newlines. ASCII-based systems use some combination of carriage return and line feed. These derive from typewriters: A carriage return (CR) resets the typewriter carriage’s horizontal position to the far left and a line feed (LF) advances the paper one vertical line. For a typewriter, you need both, so some systems (DOS, Windows, Palm OS) adopted CR+LF as representation of a newline. Other systems, such as Unix, noted a computer didn’t have a carriage to return so a sole line feed was sufficient. Still others, such as Mac OS prior to OS X, adopted only a carriage return—arguably, this choice doesn’t make any sense, as a bare carriage return would swing the typewriter carriage back to the left but not advance the page. Still other systems used LF+CR, inverting the ASCII characters used in Windows.

Systems not based on ASCII, of course, did their own thing. IBM mainframes built around EBCDIC, for example, used a special newline character (NL). Perhaps oddest of all, VMS utilized a record-based filesystem where newlines were first-class citizens to the operating system. Each record was implicitly its own line and thus there were no explicit newline representation!

But none of this mattered, because these systems never had to interoperate with each other—or, if they did, they had to make so many other conversions that newline representation was the least of their worries.

Today, most Internet protocols recommend CR+LF but dictate compatibility with LF (CR and LF+CR are left out in the cold). Given the centrality of the Internet, the ubiquity of Unix, which heralds LF, the primacy of C and descendant languages, which (somewhat) map their newline to LF, and the fact we really only need one character to represent a newline, LF seems the clear standard going forward.