Every pub in the United Kingdom

This Reddit thread shares the map of all the pubs in the UK.  The Poke picked it up and wrapped it into some more links and quotes.  Apparently, not even all the pubs are covered:

“Nope. There’s at least 12 pubs missing from the north coast of Scotland. Thurso alone has more than 6, 2 in Bettyhill, Tongue and Melvich plus a few others all missing”, writes shaidy64

The source of the map is here referencing 24,727 UK pubs.  And I’ve only been to like, what, 3?  This situation urgently needs correction.

Database Engines Ranking

db-engines-ranking-table

DB-Engines.com provides some insight into some of the most popular database engines (312 of them to be precise).  Nothing too surprising there – Oracle and MySQL leading the charts, but it’s nice to have the numbers and trends.

db-engines-ranking

There are, of course, many different ways how the popularity can be calculated.  Their method is based on the popularity of each engine in a variety of online outlets, from Google Search to social networks.

  • Number of mentions of the system on websites, measured as number of results in search engines queries. At the moment, we use Google, Bing and Yandex for this measurement. In order to count only relevant results, we are searching for <system name> together with the term database, e.g. “Oracle” and “database”.
  • General interest in the system. For this measurement, we use the frequency of searches in Google Trends.
  • Frequency of technical discussions about the system. We use the number of related questions and the number of interested users on the well-known IT-related Q&A sites Stack Overflow and DBA Stack Exchange.
  • Number of job offers, in which the system is mentioned. We use the number of offers on the leading job search engines Indeed and Simply Hired.
  • Number of profiles in professional networks, in which the system is mentioned. We use the internationally most popular professional networks LinkedIn and Upwork.
  • Relevance in social networks. We count the number of Twitter tweets, in which the system is mentioned.

It seems objective and representative enough to me.

WordPress now powers 27.1% of all websites on the Internet

wordpress-market-share

WordPress Tavern states:

WordPress now powers 27.1% of all websites on the internet, up from 25% last year. While it may seem that WordPress is neatly adding 2% of the internet every year, its percentage increase fluctuates from year to year and the climb is getting more arduous with more weight to haul.

Linking to these statistics from W3Techs.  Impressive!

Those who think that WordPress is just a blogging system are far from the truth…

Top 29 books on Amazon from Hacker News comments

hacker-news-books

I came across this nice visualization of “Top 29 books ranked by unique users linking to Amazon in Hacker News comments“.

Amazon product links were extracted and counted from 8.3M comments posted on Hacker News from Oct 2006 to Oct 2015.

Most of these are, not surprisingly, on programming and design.  A few are on startups and business.  Some are on how to have a good life.  Which is a bit weird.

400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?

Here is an interesting bit of research – do people prefer tabs or spaces when programming the most popular languages?

Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.

The results are not very surprising and somewhat disappointing (for all of us, tab fans):

tabs vs. spaces

As far as PHP goes, I’m sure the choice of spaces has to do with the PSR-2 coding style guide, which states:

Code MUST use 4 spaces for indenting, not tabs.

On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren’t as good as Vim.  Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.

Regardless of the results of the study, what’s more interesting is the method and tools used.  I’ve had my eye on the Google Big Query for a while now, but I’m too busy these days to give it a try.  The article gives a few insights, into how awesome the tool is.  1.6 terabytes of data processed in 864.6 seconds:

That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don’t worry about having to run it, since I left the result publicly available at [fh-bigquery:github_extracts.contents_top_repos_top_langs].

and:

Analyzing each line of 133 GBs of code in 16 seconds? That’s why I love BigQuery.

If you enjoyed this article, also have a look at “Analyzing GitHub issues and comments with BigQuery“, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.