Here is an interesting bit of research – do people prefer tabs or spaces when programming the most popular languages?
Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.
The results are not very surprising and somewhat disappointing (for all of us, tab fans):
As far as PHP goes, I’m sure the choice of spaces has to do with the PSR-2 coding style guide, which states:
Code MUST use 4 spaces for indenting, not tabs.
On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren’t as good as Vim. Â Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.
Regardless of the results of the study, what’s more interesting is the method and tools used. Â I’ve had my eye on the Google Big Query for a while now, but I’m too busy these days to give it a try. Â The article gives a few insights, into how awesome the tool is. Â 1.6 terabytes of data processed in 864.6 seconds:
That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don’t worry about having to run it, since I left the result publicly available at [fh-bigquery:github_extracts.contents_top_repos_top_langs].
and:
Analyzing each line of 133 GBs of code in 16 seconds? That’s why I love BigQuery.
If you enjoyed this article, also have a look at “Analyzing GitHub issues and comments with BigQuery“, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.
400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs? #GitHub… https://t.co/vXZIkrYF2K https://t.co/qfgiO2xIFB
RT @mamchenkov: 400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs? #WebDev #PHP #stats #BigData https://t…
Tabs. And tab stop set at 4 in editor.
And most people will hate you. And deny your pull requests :)
This world is unfair place (especially democracy). Let them do it for their own good ;-)
Well, in Open Source software its meritocracy mostly. And they did … :)
So nobody really cares about tab stops if the code works. And if they do – middle finger is a smart reply :-)
That’s not what I said. :) What I meant was this – meritocracy rules in he majority of the Open Source projects. So, people bringing in more to the table have more weight when decisions are made. Whether you like it or not, if they decide to use tabs, or spaces, this is how it be. No matter how stupid spaces are, but if the core developers prefer them, that’s what you’ll get.
As far as the middle fingers are concerned – not sure I’m on your side here. Coding style is important. Especially for any project that has more than one developer. Code review sucks when people use different formats and constantly send coding reformat changes.
The better approach is a right toolkit. Proper editors (like Vim) can format and reformat source code the way it suits you. Fixing the code style before commits/PRs is almost trivial. If not in the editor, then with specialized tool for the given language. Heck, some languages actually impose a specific coding style (*cough*Pythong*cough*) … :)
Thanks for taking time and explaining it! :-)
Alexey Bozrikov liked this on Facebook.
RT @mamchenkov: 400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs? #WebDev #PHP #stats #BigData https://t…