{"id":26683,"date":"2016-09-12T08:34:12","date_gmt":"2016-09-12T06:34:12","guid":{"rendered":"https:\/\/mamchenkov.net\/wordpress\/?p=26683"},"modified":"2016-09-12T08:34:12","modified_gmt":"2016-09-12T06:34:12","slug":"400000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs","status":"publish","type":"post","link":"https:\/\/mamchenkov.net\/wordpress\/2016\/09\/12\/400000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs\/","title":{"rendered":"400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?"},"content":{"rendered":"<!-- google_ad_section_start -->\n<p>Here is an <a href=\"https:\/\/medium.com\/@hoffa\/400-000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs-7cfe0b5dd7fd#.ebjyf05fq\">interesting bit of research<\/a> &#8211; do people prefer tabs or spaces when programming the most popular languages?<\/p>\n<blockquote><p>Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.<\/p><\/blockquote>\n<p>The results are not very surprising and somewhat disappointing (for all of us, tab fans):<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2016\/09\/tabs-vs.-spaces.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"26684\" data-permalink=\"https:\/\/mamchenkov.net\/wordpress\/2016\/09\/12\/400000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs\/tabs-vs-spaces\/\" data-orig-file=\"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2016\/09\/tabs-vs.-spaces.png?fit=592%2C389&amp;ssl=1\" data-orig-size=\"592,389\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"tabs vs. spaces\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2016\/09\/tabs-vs.-spaces.png?fit=592%2C389&amp;ssl=1\" class=\"aligncenter size-medium wp-image-26684\" src=\"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2016\/09\/tabs-vs.-spaces-500x329.png?resize=500%2C329&#038;ssl=1\" alt=\"tabs vs. spaces\" width=\"500\" height=\"329\" srcset=\"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2016\/09\/tabs-vs.-spaces.png?resize=500%2C329&amp;ssl=1 500w, https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2016\/09\/tabs-vs.-spaces.png?w=592&amp;ssl=1 592w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>As far as PHP goes, I&#8217;m sure the choice of spaces has to do with the <a href=\"http:\/\/www.php-fig.org\/psr\/psr-2\/\">PSR-2 coding style guide<\/a>, which states:<\/p>\n<blockquote><p>Code MUST use 4 spaces for indenting, not tabs.<\/p><\/blockquote>\n<p>On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren&#8217;t as good as <a href=\"http:\/\/www.vim.org\">Vim<\/a>. \u00a0Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.<\/p>\n<p>Regardless of the results of the study, what&#8217;s more interesting is the method and tools used. \u00a0I&#8217;ve had my eye on the <a href=\"https:\/\/cloud.google.com\/bigquery\/\">Google Big Query<\/a> for a while now, but I&#8217;m too busy these days to give it a try. \u00a0The article gives a few insights, into how awesome the tool is. \u00a01.6 terabytes of data processed in 864.6 seconds:<\/p>\n<blockquote><p>That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don\u2019t worry about having to run it, since I left the result publicly available at [<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/bigquery.cloud.google.com\/table\/fh-bigquery:github_extracts.contents_top_repos_top_langs\" target=\"_blank\" rel=\"nofollow\" data-href=\"https:\/\/bigquery.cloud.google.com\/table\/fh-bigquery:github_extracts.contents_top_repos_top_langs\">fh-bigquery:github_extracts.contents_top_repos_top_langs<\/a>].<\/p><\/blockquote>\n<p>and:<\/p>\n<blockquote><p>Analyzing each line of 133 GBs of code in 16 seconds? That\u2019s why I love BigQuery.<\/p><\/blockquote>\n<p>If you enjoyed this article, also have a look at &#8220;<a href=\"https:\/\/medium.com\/google-cloud\/analyzing-github-issues-and-comments-with-bigquery-c41410d3308#.dkyc3ksre\">Analyzing GitHub issues and comments with BigQuery<\/a>&#8220;, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.<\/p>\n<!-- google_ad_section_end -->\n","protected":false},"excerpt":{"rendered":"<!-- google_ad_section_start -->\n<p>Here is an interesting bit of research &#8211; do people prefer tabs or spaces when programming the most popular languages? Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top. The results are not very surprising and somewhat disappointing (for all of us, &hellip; <a href=\"https:\/\/mamchenkov.net\/wordpress\/2016\/09\/12\/400000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?<\/span><\/a><\/p>\n<!-- google_ad_section_end -->\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs? #GitHub #BigQuery #stats #WebDev #programming","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_links_to":"","_links_to_target":""},"categories":[1,18,62,1334],"tags":[3198,2243,2809,3444,38,1117,1041,1330],"keyring_services":[],"class_list":["post-26683","post","type-post","status-publish","format-standard","hentry","category-general","category-programming","category-technology","category-web-work","tag-big-data","tag-coding-style","tag-github","tag-google-bigquery","tag-php","tag-research","tag-statistics","tag-web-development"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":27727,"url":"https:\/\/mamchenkov.net\/wordpress\/2017\/06\/27\/using-non-breakable-spaces-in-test-method-names\/","url_meta":{"origin":26683,"position":0},"title":"Using non-breakable spaces in test method names","author":"Leonid Mamchenkov","date":"June 27, 2017","format":false,"excerpt":"Using non-breakable spaces in test method names is a great example of how something can start as a joke and quickly turn into something very practical and useful. if we decide to not follow PSR-2 naming for test methods because of readability, we might as well use non-breakable spaces since\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2017\/06\/nbsp-code-500x80.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":36247,"url":"https:\/\/mamchenkov.net\/wordpress\/2019\/02\/26\/refactoring-guru-design-patterns-php\/","url_meta":{"origin":26683,"position":1},"title":"Refactoring.Guru : Design Patterns + PHP","author":"Leonid Mamchenkov","date":"February 26, 2019","format":false,"excerpt":"Refactoring.Guru is a great resource for learning about refactoring best practices and design patterns. A lot of the website's content is also available as Dive into Design Patterns ebook. Today I came across this GitHub repository, which makes this resource even better specifically for PHP developers. Yup, that's right, the\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2019\/02\/refactoring.guru_.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2019\/02\/refactoring.guru_.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2019\/02\/refactoring.guru_.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2019\/02\/refactoring.guru_.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":28340,"url":"https:\/\/mamchenkov.net\/wordpress\/2018\/01\/22\/how-to-read-big-files-with-php-without-killing-your-server\/","url_meta":{"origin":26683,"position":2},"title":"How to Read Big Files with PHP (Without Killing Your Server)","author":"Leonid Mamchenkov","date":"January 22, 2018","format":false,"excerpt":"Here's an interesting article that was hanging around in my \"to blog\" tabs for a while now:\u00a0How to Read Big Files with PHP (Without Killing Your Server).\u00a0 I found the title to be slightly misleading, expecting the good old advice of reading and processing files line by line rather than\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11949,"url":"https:\/\/mamchenkov.net\/wordpress\/2009\/11\/29\/enforcing-coding-styles-in-php\/","url_meta":{"origin":26683,"position":3},"title":"Enforcing coding styles in PHP","author":"Leonid Mamchenkov","date":"November 29, 2009","format":false,"excerpt":"I came across a plugin for CakePHP which helps to check if the certain code follows CakePHP coding style.\u00a0 While I haven't tried it, I think the better way is to utilize CodeSniffer.\u00a0 As per PHP_CodeSniffer PEAR page: PHP_CodeSniffer tokenises PHP, JavaScript and CSS files and detects violations of a\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24612,"url":"https:\/\/mamchenkov.net\/wordpress\/2015\/08\/20\/rank-of-top-languages-on-github-com-over-time\/","url_meta":{"origin":26683,"position":4},"title":"Rank of top languages on GitHub.com over time","author":"Leonid Mamchenkov","date":"August 20, 2015","format":false,"excerpt":"GitHub blog shares some trends in regards to programming languages, which includes both public and private repositories: Interesting. \u00a0I haven't seen many Java and C# projects myself, but I'm in a very different bubble. \u00a0PHP stays on #4 for years. \u00a0VimL, the language in which most plugins for Vim editor\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"GitHub programming languages","src":"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2015\/08\/GitHub-programming-languages-500x288.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":23189,"url":"https:\/\/mamchenkov.net\/wordpress\/2014\/12\/22\/tab-snooze-snooze-tabs-for-later\/","url_meta":{"origin":26683,"position":5},"title":"Tab Snooze &#8211; snooze tabs for later","author":"Leonid Mamchenkov","date":"December 22, 2014","format":"link","excerpt":"Tab Snooze - snooze tabs for later. Also on GitHub.","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"tab snooze","src":"https:\/\/i0.wp.com\/mamchenkov.net\/wordpress\/wp-content\/uploads\/2014\/12\/tab-snooze-463x500.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/posts\/26683","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/comments?post=26683"}],"version-history":[{"count":0,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/posts\/26683\/revisions"}],"wp:attachment":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/media?parent=26683"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/categories?post=26683"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/tags?post=26683"},{"taxonomy":"keyring_services","embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/keyring_services?post=26683"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}