400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?

Here is an interesting bit of research – do people prefer tabs or spaces when programming the most popular languages?

Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.

The results are not very surprising and somewhat disappointing (for all of us, tab fans):

tabs vs. spaces

As far as PHP goes, I’m sure the choice of spaces has to do with the PSR-2 coding style guide, which states:

Code MUST use 4 spaces for indenting, not tabs.

On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren’t as good as Vim.  Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.

Regardless of the results of the study, what’s more interesting is the method and tools used.  I’ve had my eye on the Google Big Query for a while now, but I’m too busy these days to give it a try.  The article gives a few insights, into how awesome the tool is.  1.6 terabytes of data processed in 864.6 seconds:

That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don’t worry about having to run it, since I left the result publicly available at [fh-bigquery:github_extracts.contents_top_repos_top_langs].

and:

Analyzing each line of 133 GBs of code in 16 seconds? That’s why I love BigQuery.

If you enjoyed this article, also have a look at “Analyzing GitHub issues and comments with BigQuery“, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.

Page builders and multilingual WordPress websites

WPML.org, the web home of the WordPress Multilingual Plugin runs this blog post about the upcoming support for WordPress page builders.  Apart from the good news themselves, there are some insightful results of the survey that the team did, trying to understand who uses page builders and how.  I found the stats on which page builder solutions people use the most interesting:

q2-which-page-builder

At work we are primarily using Divi (when we are not building our own themes), but we’ve also done a few sites with Enfold.  I’ve also seen Avada in the wild.  But I can’t tell you which ones are better, because when it comes to using page builders, I’m mostly not involved.  These tools are so awesome these days that they can be easily used by a non-technical person.  Which is exactly what we do ;)

Analyzing 2+ Million Travis Builds

TravisCI – a continuous integration service – shares some of the insights from over 2,000,000 builds they’ve run, in an blog post called “What We Learned about Continuous Integration from Analyzing 2+ Million Travis Builds“.  For me, the most valuable bit is about the reasons for failing builds, which clearly indicates the need for and the importance of unit, integration, and UI tests:

2016-07-28-analyzing-travis-builds-0

Around 20% of all builds fail.  There is a variation based on the language – for some programming languages, testing is part of the process and culture – for others it’s an acquired tool.  Once you do implement testing, most of your builds will run.  You’ll cancel very few.  But about 20% will fail due to failed unit tests, configurations, or environment setups.  Catching these 20% before it hits production is super important.

GitHub private repository contributions on your profile

GitHub blog says that from now on your profile can include the private repository contributions on your profile.

github private repo contributions

When enabled, these can make quite a difference in the number of the green boxes, showing your GitHub activity.  Here’s an example from mine.  Before enabling those, showing only Open Source contributions:

GitHub mamchenkov before

And here’s one after, including private repository contributions:

GitHub mamchenkov after

Indeed, it is a more accurate representation of my GitHub activity.  Given that these days most of my private repository activity happens on BitBucket and not on GitHub, this is quite surprising.

Common files in PHP packages

Jordi Boggiano looks at some common files in PHP packages, using Packagist as a data source.  There are some interesting metrics in there.  For example:

  • 58% of packages include a src/ directory and 5% a lib/ one. That’s surprisingly low to me, that means a lot have the code simply in the root folder.
  • 4% have a bin/ directory, including some sort of CLI executables.
  • 55% have a LICENSE file, that’s.. pretty disastrous but hopefully a lot of those that don’t at least indicate in the README and composer.json
  • 49% have some file or directory indicating the presence of tests (phpunit.xml & co). I am not sure if this is good or bad news to be honest, that depends on your expectations.

Visualization of the European refugee crisis

refugees

The flow towards Europe project provides a vivid visualization of the refugee migration.  It is an interactive map with breakdowns by country, and with a timeline covering the years 2012-2015.

Europe is experiencing the biggest refugee crisis since World War II. Based on data from the United Nations, we clarify the scale of the crisis.

IPv6 20th birthday with 10% global penetration

Here’s some not so light coffee time reading on IPv6 – IPv6 non-alternatives: DJB’s article, 13 years later – an article that links, among other things to this Ars Technica article, which features some IPv6 statistics.  Summary?  Sure.  IPv6 RFC celebrates 20 year birthday this month with 10% global penetration.

ipv6

Exponential growth year-on-year is good.  But the absolute numbers aren’t so bright yet.  Especially considering some of the areas where it wasn’t so successful.

Jetpack annual report for mamchenkov.net in 2015

This year’s Jetpack annual report for this blog is ready – have a look.  Here’s a teaser:

blog stats 2015

It’s been a busy year, so I haven’t been blogging as much as I wanted to, but overall, I think I did good (have a look at 2014 and 2013).  Just to give you a quick comparison:

Metric 2013 2014 2015
Visitors 58,000 81,000 96,000
Posts 560 628 541

I blog mostly for myself, but it’s nice to see a slight grow in traffic. Although the fact that the most popular post in this blog throughout the years – how to check Squid proxy version – is a little concerning, yet funny.  Well, at least people still find my “Vim for Perl developers” useful, even though it’s been more than 10 years since I wrote that (and probably five years since I promised to update it soon).

But as I said, I’m quite satisfied with my blogging this year.  Hopefully I can continue to do the same in 2016.

 

How Far Can You Go With HAProxy and a t2.micro

Here’s an interesting set of experiments trying to answer the question of how far can you go with HAProxy setup on the smallest of the Amazon EC2 instances – t2.micro (1 virtual CPU, 1 GB of RAM).  Here’s the summary.

460 requests/second

At 460 req/second response times are mostly a flat ~300 ms, except for two spikes. I attribute this to TCP congestion avoidance as the traffic approaches the limit and packets start to get dropped. After dropped packets are detected the clients reduce their transmission rate, but eventually the transmission rate stabilizes again just under the limit. Only 1739 requests timeout and 134918 succeed.

[…]

It seems that the limit of the t2.micro is around 500 req/second even for small responses.