Why are there different representations for newlines in Windows, Linux, and Mac?

Why are there different representations for newlines in Windows, Linux, and Mac?

This is a good question albeit one with a boring answer. Different systems evolved different encodings for newlines in the same way they evolved different behavior for myriad other things: Each system had to standardize on something and interoperability in the days before email let alone the Internet was unimportant.

There are several ways to represent newlines. ASCII-based systems use some combination of carriage return and line feed. These derive from typewriters: A carriage return (CR) resets the typewriter carriage’s horizontal position to the far left and a line feed (LF) advances the paper one vertical line. For a typewriter, you need both, so some systems (DOS, Windows, Palm OS) adopted CR+LF as representation of a newline. Other systems, such as Unix, noted a computer didn’t have a carriage to return so a sole line feed was sufficient. Still others, such as Mac OS prior to OS X, adopted only a carriage return—arguably, this choice doesn’t make any sense, as a bare carriage return would swing the typewriter carriage back to the left but not advance the page. Still other systems used LF+CR, inverting the ASCII characters used in Windows.

Systems not based on ASCII, of course, did their own thing. IBM mainframes built around EBCDIC, for example, used a special newline character (NL). Perhaps oddest of all, VMS utilized a record-based filesystem where newlines were first-class citizens to the operating system. Each record was implicitly its own line and thus there were no explicit newline representation!

But none of this mattered, because these systems never had to interoperate with each other—or, if they did, they had to make so many other conversions that newline representation was the least of their worries.

Today, most Internet protocols recommend CR+LF but dictate compatibility with LF (CR and LF+CR are left out in the cold). Given the centrality of the Internet, the ubiquity of Unix, which heralds LF, the primacy of C and descendant languages, which (somewhat) map their newline to LF, and the fact we really only need one character to represent a newline, LF seems the clear standard going forward.

Leave a Comment