Dependency resolution with graphs in PHP

One of the projects I am working on at work presented an interesting problem.  I had a list of items with dependencies on one another and I needed to figure out the order in which to use those items, based on their dependencies.

For the sake of the example, think of a list of database tables, which reference each other.  I need a way to export those tables in such a way, that when imported back in, tables that have dependencies will be imported after all those tables on which they depend.  (It’s not the real task I’m working on, but close enough.)

Consider the following list as an example of input data:

// List of items with dependencies.  Order is not important
$tables = [ 
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

The result of the dependency resolution should give me the list like this (there are variations in order, of course, as long as the dependencies are satisfied):

categories
options
tags
users
users_meta
articles
articles_meta
comments

There are several ways to solve this problem.  My first attempt took about 50 lines of code and worked fine, but it lacked elegance.  It had too many nested loops and tricky conditions and was difficult to read.  My second attempt was slightly better, with a bit of a recursion, but still looked somewhat off.  It felt like there is a better way to do it, and that I’ve done something similar before, but I could put my finger on it.

I thought I’d take a look at something that solves a similar problem.  Composer, PHP package and dependency manager, surely had something to offer.  A brief check of the GitHub repository, and that idea is out of my hand.  Composer deals with much more complex issues, so its Dependency Resolver code is not something I can grasp in a few minutes.

It was time for some Googling.  Moments later, my deja vu feeling of “I’ve seen this before” was easily explained.  This problem fits into the graph theory, which I probably used last back in my college years.  Of course, I could have grabbed the book off the shelf and refresh my knowledge, practicing the sacred art of the Real Programming.  But time was an issue, so I cheated.

I found this “Dependency resolving algorithm” blog post by Ferry Boender over at Electric Monk (thanks man!).  He had exactly what I needed – simple and straight forward recursive algorithm for walking the graph, circular dependency detection, and even some performance optimization.

dep_graph1

The only problem was that his code is all in Python.  But that’s not really a problem.  So I’ve rewritten his code in PHP and got exactly what I needed.  Here it is:

// List of items with dependencies.  Order is not important
$tables = [
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

$resolved = []; 
$unresolved = []; 
// Resolve dependencies for each table
foreach (array_keys($tables) as $table) {
    try {
        list ($resolved, $unresolved) = dep_resolve($table, $tables, $resolved, $unresolved);
    } catch (\Exception $e) {
        die("Oops! " . $e->getMessage());
    }   
}

// Print out result
foreach ($resolved as $table) {
    $deps = empty($tables[$table]) ? 'none' : join(',', $tables[$table]);
    print "$table (deps: $deps)\n";
}

/**
 * Recursive dependency resolution
 * 
 * @param string $item Item to resolve dependencies for
 * @param array $items List of all items with dependencies
 * @param array $resolved List of resolved items
 * @param array $unresolved List of unresolved items
 * @return array
 */
function dep_resolve($item, array $items, array $resolved, array $unresolved) {
    array_push($unresolved, $item);
    foreach ($items[$item] as $dep) {
        if (!in_array($dep, $resolved)) {
            if (!in_array($dep, $unresolved)) {
                array_push($unresolved, $dep);
                list($resolved, $unresolved) = dep_resolve($dep, $items, $resolved, $unresolved);
            } else {
                throw new \RuntimeException("Circular dependency: $item -> $dep");
            }
        }
    }
    // Add $item to $resolved if it's not already there
    if (!in_array($item, $resolved)) {
        array_push($resolved, $item);
    }
    // Remove all occurrences of $item in $unresolved
    while (($index = array_search($item, $unresolved)) !== false) {
        unset($unresolved[$index]);
    }

    return [$resolved, $unresolved];
}

Running the above code produces the following result:

$ php dependecy.php 
users (deps: none)
categories (deps: none)
tags (deps: none)
articles (deps: users,categories,tags)
articles_meta (deps: articles)
comments (deps: users,articles)
options (deps: none)
users_meta (deps: users)

Which is exactly what I was looking for.  And now that I have it here, I’ll probably be needing it again and again.  It’s an elegant hammer to a lot of my nails.

How to Read and Improve the C.R.A.P Index of your code

crapclasscompletetest

Levi Hackwith has an excellent post explaining “How to Read and Improve the C.R.A.P Index of your code“:

The C.R.A.P. (Change Risk Analysis and Predictions) index is designed to analyze and predict the amount of effort, pain, and time required to maintain an existing body of code.

It iterates over the old bits of wisdom – write simpler code and cover it with unit tests – but it does so in a very simple and measurable way.

He also reminds us that:

…software metrics, in general, are just tools. No single metric can tell the whole story; it’s just one more data point. Metrics are meant to be used by developers, not the other way around – the metric should work for you, you should not have to work for the metric. Metrics should never be an end unto themselves. Metrics are meant to help you think, not to do the thinking for you. ~Alberto Savoia

Robo – Modern Task Runner for PHP

robo

There is a whole lot of ways to build and deploy web applications these days.  I’ve done my own circle of trials and errors and have some very strong opinions on which ones are good, which ones are bad, and which ones are ugly.

My most recent discovery was Robo – a modern task runner for PHP.  I’ve been pushing it to one of those weekends, where I have nothing better to do, to trying it out.  And today I did.  Not just tried it out, but replaced a part of our infrastructure, which was previously running on Laravel Envoy.

Robo is very nice.  On one hand, it’s simple and straight forward and is very easy to get you started with.  On the other hand, it provides quite a bit of functionality to help you with the build and deploy process.  Here are some of the things that I love about it:

  • It’s PHP!  Which makes it perfect for PHP projects, the kind I’m dealing with most of my time.  No need to translate your PHP into XML (hey Phing), or into a weird rake/capistrano like syntax.
  • Instant support for command line arguments and their validation, help screens, ANSI colors, and the like.  Honestly, I don’t know why we are still fighting these things in 2016.
  • Transparency.  You install robo with composer, create your RoboFile.php, and you are done.  All public methods of the class in the RoboFile.php will be available as robo commands.  All parameters of public methods are populated by the command line arguments.  There is no black magic to it.  All that is executed is whatever you write.
  • Extensions (or Tasks and Stacks).  There are a few to utilize already (SSH, git, and more).  And it is trivial to write your own and share them with composer/Packagist.  That’s one of the things that is difficult with our current build setup based on phake (hence our phake-builder).

Things I don’t like (remember I have been using Robo for just about 3 minutes):

  • Logging.  While it’s trivial to add logs into the RoboFiles using your choice of the logging library (monolog is awesome), I haven’t found a way to get all the output after the command execution.  For now, I’ve wrapped the Robo run into a shell script, which collects all outputs and sends it by email into our deployment archives storage.
  • Remote power.  Robo has some very cool tasks and stacks for local work.  But I haven’t found a way to utilize them when using Robo’s SSH task.  This slightly spoils the clean remote commands with sprinkles of bash, which I would love to avoid.

Overall, it looks like a very nice and elegant system and I’ll probably be using it for much more.  Once I get a bit more comfortable with it, I will probably replace our phake-builder setup with Robo.

If you are looking for a good tool to use for your build and deploy needs, give it a try.  You’ll probably like it a lot.

tagbar-phpctags : Vim plugin for PHP developeres

phpctags

If you are using Vim editor to write PHP code, you probably already know about the excellent tagbar plugin, which lists methods, variables and the like in an optional window split.  Recently, I’ve learned of an awesome phpctags-tagbar plugin, which extends and improves this functionality via a phpctags tool, which has a deeper knowledge of PHP than the classic ctags tool.

Once installed, you’ll have a more organized browser of your code, with support for namespaces, classes, interfaces, constants, and variables.

PHP: array_merge_recursive() vs. array_replace_recursive()

Here is a nice blog post describing the important differences between array_merge_recursive() and array_replace_recursive() functions in PHP.  These are often overlooked when testing new developments with simpler data structures.  Troubleshooting for it later is not too obvious.

BitBucket Pipelines and Docker for PHP Developers

I’ve been meaning to look into Docker for a long while now.  But, as always, time is the issue.  In the last couple of days though I’ve been integrating BitBucket Pipelines into our workflow.  BitBucket Pipelines is a continuous integration solution, which runs your project tests in a Docker container.  So, naturally, I had to get a better idea of how the whole thing works.

Docker for PHP Developers” article was super useful.  Even though it wasn’t immediately applicable to BitBucket Pipelines, as they don’t currently support multiple containers – everything has to run within a single container.

The default BitBucket Pipelines configuration suggests the phpunit/phpunit image.  If you want to run PHPUnit tests only, that works fine.  But if you want to have a full blown Nginx and MySQL setup for extra bits (UI tests, integration tests, etc), then you might find smartapps/bitbucket-pipelines-php-mysql image much more useful.  Here’s the full bitbucket-pipelines.yml file that I’ve ended up with.

MySQL, PHP and “Integrity constraint violation: 1062 Duplicate entry”

Anna Filina blogs about an interesting problem she encountered with when working on a PHP and MySQL project:

MySQL was complaining about “Integrity constraint violation: 1062 Duplicate entry”. I had all the necessary safeguards in my code to prevent duplicates in tha column.

I gave up on logic and simply dumped the contents of the problematic column for every record. I found that there was a record with and without an accent on one of the characters. PHP saw each as a unique value, but MySQL did not make a distinction, which is why it complained about a duplicate value. It’s a good thing too, because based on my goal, these should have been treated as duplicates.

She also mentions two possible solutions to the problem:

My solution was to substitute accented characters before filtering duplicates in the code. This way, similar records were rejected before they were sent to the database.

and

As pointed out in the comments, a more robust and versatile solution would be to check the collation on the column.

I’m sure this will come in handy one day.

400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?

Here is an interesting bit of research – do people prefer tabs or spaces when programming the most popular languages?

Tabs or spaces. We are going to parse a billion files among 14 programming languages to decide which one is on top.

The results are not very surprising and somewhat disappointing (for all of us, tab fans):

tabs vs. spaces

As far as PHP goes, I’m sure the choice of spaces has to do with the PSR-2 coding style guide, which states:

Code MUST use 4 spaces for indenting, not tabs.

On a more technical note, I think this is also related to the explosion of editors and IDEs in the recent years, which, as good as they are, aren’t as good as Vim.  Vim allows for a very flexible configuration, where your code can be formatted and re-formatted any way you like, making tabs or spaces a non-issue at all.

Regardless of the results of the study, what’s more interesting is the method and tools used.  I’ve had my eye on the Google Big Query for a while now, but I’m too busy these days to give it a try.  The article gives a few insights, into how awesome the tool is.  1.6 terabytes of data processed in 864.6 seconds:

That query took a relative long time since it involved joining a 190 million rows table with a 70 million rows one, and over 1.6 terabytes of contents. But don’t worry about having to run it, since I left the result publicly available at [fh-bigquery:github_extracts.contents_top_repos_top_langs].

and:

Analyzing each line of 133 GBs of code in 16 seconds? That’s why I love BigQuery.

If you enjoyed this article, also have a look at “Analyzing GitHub issues and comments with BigQuery“, which works with a similar-sized data, trying to figure out how to write bug reports and pull request comments, so that they would be acted upon faster.

PHP backdoors

PHP backdoors repository is a collection of obfuscated and deobfuscated PHP backdoors. (For educational or testing purposes only, obviously.)  These provide a great insight into what kind of functionality the attackers are looking for when they exploit your application.  Most of these rotate around file system operations, executing commands, and sending emails.

One of the things from those files that I haven’t seen before is FOPO – Free Online PHP Obfuscator tool.

The Twelve-Factor App

I first heard about the twelve-factor app a couple of years ago, in Berlin, during the International PHP conference.  It was the basis for David Zulke (of Heroku fame) talk on the best practices for the modern day PHP applications.

The twelve-factor app is a methodology for building software-as-a-service apps that:

  • Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
  • Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
  • Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
  • Minimize divergence between development and production, enabling continuous deployment for maximum agility;
  • And can scale up without significant changes to tooling, architecture, or development practices.

The twelve-factor methodology can be applied to apps written in any programming language, and which use any combination of backing services (database, queue, memory cache, etc).

Here are the 12 factors, each one covered in detail on the site:

  1. Codebase: one codebase tracked in revision control, many deploys.
  2. Dependencies: explicitly declare and isolate dependencies.
  3. Config: store config in the environment.
  4. Backing services: treat backing services as attached resources.
  5. Build, release, run: strictly separate build and run stages.
  6. Processes: execute the app as one or more stateless processes.
  7. Port binding: export services via port binding.
  8. Concurrency: scale out via the process model.
  9. Disposability: maximize robustness with fast startup and graceful shutdown.
  10. Dev/prod parity: keep development, staging, and production as similar as possible.
  11. Logs: treat logs as event streams.
  12. Admin processes: run admin/management tasks as one-off processes.

These seem simple and straightforward, but in reality not always as easy to follow.  Regardless, these are a good goal to aim at.