400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?

Here is an interesting bit of research – do people prefer tabs or spaces when programming the most popular languages?

Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.

The results are not very surprising and somewhat disappointing (for all of us, tab fans):

tabs vs. spaces

As far as PHP goes, I’m sure the choice of spaces has to do with the PSR-2 coding style guide, which states:

Code MUST use 4 spaces for indenting, not tabs.

On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren’t as good as Vim.  Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.

Regardless of the results of the study, what’s more interesting is the method and tools used.  I’ve had my eye on the Google Big Query for a while now, but I’m too busy these days to give it a try.  The article gives a few insights, into how awesome the tool is.  1.6 terabytes of data processed in 864.6 seconds:

That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don’t worry about having to run it, since I left the result publicly available at [fh-bigquery:github_extracts.contents_top_repos_top_langs].

and:

Analyzing each line of 133 GBs of code in 16 seconds? That’s why I love BigQuery.

If you enjoyed this article, also have a look at “Analyzing GitHub issues and comments with BigQuery“, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.

Git 2.9

Git 2.9 has been released a few days, bringing in some very useful functionality, such as showing renamed files in git diff and git log, forbidding the merge of two branches that have no common ancestors, configurable path to hooks, and more.  All are welcome changes, making the life of a developer easier.

But what I found interesting is how two largest git companies – GitHub and BitBucket – reflect on it.  Surely, the new release is important to both, but it’s insightful to see which features each of them looks at first.  Have a look:

 

GitHub private repository contributions on your profile

GitHub blog says that from now on your profile can include the private repository contributions on your profile.

github private repo contributions

When enabled, these can make quite a difference in the number of the green boxes, showing your GitHub activity.  Here’s an example from mine.  Before enabling those, showing only Open Source contributions:

GitHub mamchenkov before

And here’s one after, including private repository contributions:

GitHub mamchenkov after

Indeed, it is a more accurate representation of my GitHub activity.  Given that these days most of my private repository activity happens on BitBucket and not on GitHub, this is quite surprising.

GitHub unlimited private repositories – a better world or a perfect disaster?

github unlimited repositories

Today I was super excited to read the following in the GitHub blog:

We couldn’t be more excited to announce that all of our paid plans on GitHub.com now include unlimited private repositories. GitHub will always be free for public and open source projects, but starting today there are just two ways to pay for GitHub.com:

  • Personal: $7/month
  • Organization: $9/user/month, $25/month for your first five users

One of the very best things about Git and other distributed version control systems is the ability to create a new repository without asking permission or getting approval. While this has always been true for our public plans, it hasn’t been the case for individuals and teams working together in private. All that changes today.

After all, it was the pricing around private repository that pushed me towards BitBucket.

Working for a small startup with a small development team and lots of client projects that require private repositories, GitHub was too expensive of an option.  So we’ve moved all private repositories to BitBucket, which charges for the team size.  We still use GitHub for all of our Open Source work, and for the client projects where we need to work with external teams (usually, developers on the side of the client).

Can we move all our stuff back to GitHub and just use a single service for all our code, pull requests, code review, etc?  That would make a world a better place.  Let’s see …

github

Wait, what?  Our GitHub organization has 5 members and 18 external collaborators.  And, well, another 5 pending invitations to the external collaborators.  But all of these are summed up into the 28 users (!!!).  Currently, we are on the Bronze $25/month plan, which comes up to $300/year.  The new plan with unlimited repositories, as indicated on the screenshot above, will be $2,784/year.  That’s almost a 10 times increase!

Thanks, but no thanks.  Right?  Well, not really.  The GitHub blog post also says the following:

We want everyone to have a plan with unlimited private repositories, but don’t worry—you are welcome to stay on your current plan while you evaluate the new cost structure and understand how to best manage your organization members and their private repository access. And while we’re currently not enforcing a timeline to move, rest assured that you’ll have at least 12 months notice before any mandated change to your plan.

This is not very friendly.  This means that while upgrade to the new plan is now optional, it might not be so in the future.  Sure, you’ll get a warning ahead.

Dear GitHub!

I understand that you are a profit-oriented business and you need to make money.  But I think you’ve made a mistake somewhere here.  I hope you’ll re-evaluate this thing.  Otherwise, I’ll have to move away – either to BitBucket or GitLab.  And it’ll be a sad day.  I know, I’m not your largest client, but I’m sure there are many like me.

Yours truly, Leonid.

Furthermore, thinking about this, I suspect that external collaborators are being charged twice.  Sure, they can have their own repositories as well, but collaboration often involves forks and merges between multiple repositories of the same project.  So, to support this collaboration, I need to pay for the external collaborator to have access to my private repositories, while he also needs to pay on his side to be able to fork the private repository into his organization.

I think organization shouldn’t be charged for external collaborators.  Extra features for organization members – like team-mentions, finer access control, etc – can provide the incentive for the companies to pay.  But the way this looks now is just too much.

Share your public keys easily with GitHub

Here’s a handy thing that I didn’t know about – you can easily share your public keys by adding them to your GitHub account and then accessing the URL of the form https://github.com/YOUR_USERNAME.keys .  What you get is a plain text response with all your public keys, ready to be inserted into .ssh/authorized_keys file or anywhere else you want them.

Here’s an example of mine – https://github.com/mamchenkov.keys .  Don’t forget to configure two factor authentication for your GitHub account for an extra layer of security.  You probably don’t want any bugger who got your password inserting his own public keys into your account.