tagbar-phpctags : Vim plugin for PHP developeres

phpctags

If you are using Vim editor to write PHP code, you probably already know about the excellent tagbar plugin, which lists methods, variables and the like in an optional window split.  Recently, I’ve learned of an awesome phpctags-tagbar plugin, which extends and improves this functionality via a phpctags tool, which has a deeper knowledge of PHP than the classic ctags tool.

Once installed, you’ll have a more organized browser of your code, with support for namespaces, classes, interfaces, constants, and variables.

PHP: array_merge_recursive() vs. array_replace_recursive()

Here is a nice blog post describing the important differences between array_merge_recursive() and array_replace_recursive() functions in PHP.  These are often overlooked when testing new developments with simpler data structures.  Troubleshooting for it later is not too obvious.

BitBucket Pipelines and Docker for PHP Developers

I’ve been meaning to look into Docker for a long while now.  But, as always, time is the issue.  In the last couple of days though I’ve been integrating BitBucket Pipelines into our workflow.  BitBucket Pipelines is a continuous integration solution, which runs your project tests in a Docker container.  So, naturally, I had to get a better idea of how the whole thing works.

Docker for PHP Developers” article was super useful.  Even though it wasn’t immediately applicable to BitBucket Pipelines, as they don’t currently support multiple containers – everything has to run within a single container.

The default BitBucket Pipelines configuration suggests the phpunit/phpunit image.  If you want to run PHPUnit tests only, that works fine.  But if you want to have a full blown Nginx and MySQL setup for extra bits (UI tests, integration tests, etc), then you might find smartapps/bitbucket-pipelines-php-mysql image much more useful.  Here’s the full bitbucket-pipelines.yml file that I’ve ended up with.

MySQL, PHP and “Integrity constraint violation: 1062 Duplicate entry”

Anna Filina blogs about an interesting problem she encountered with when working on a PHP and MySQL project:

MySQL was complaining about “Integrity constraint violation: 1062 Duplicate entry”. I had all the necessary safeguards in my code to prevent duplicates in tha column.

I gave up on logic and simply dumped the contents of the problematic column for every record. I found that there was a record with and without an accent on one of the characters. PHP saw each as a unique value, but MySQL did not make a distinction, which is why it complained about a duplicate value. It’s a good thing too, because based on my goal, these should have been treated as duplicates.

She also mentions two possible solutions to the problem:

My solution was to substitute accented characters before filtering duplicates in the code. This way, similar records were rejected before they were sent to the database.

and

As pointed out in the comments, a more robust and versatile solution would be to check the collation on the column.

I’m sure this will come in handy one day.

400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?

Here is an interesting bit of research – do people prefer tabs or spaces when programming the most popular languages?

Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.

The results are not very surprising and somewhat disappointing (for all of us, tab fans):

tabs vs. spaces

As far as PHP goes, I’m sure the choice of spaces has to do with the PSR-2 coding style guide, which states:

Code MUST use 4 spaces for indenting, not tabs.

On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren’t as good as Vim.  Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.

Regardless of the results of the study, what’s more interesting is the method and tools used.  I’ve had my eye on the Google Big Query for a while now, but I’m too busy these days to give it a try.  The article gives a few insights, into how awesome the tool is.  1.6 terabytes of data processed in 864.6 seconds:

That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don’t worry about having to run it, since I left the result publicly available at [fh-bigquery:github_extracts.contents_top_repos_top_langs].

and:

Analyzing each line of 133 GBs of code in 16 seconds? That’s why I love BigQuery.

If you enjoyed this article, also have a look at “Analyzing GitHub issues and comments with BigQuery“, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.