Dependency resolution with graphs in PHP

One of the projects I am working on at work presented an interesting problem.  I had a list of items with dependencies on one another and I needed to figure out the order in which to use those items, based on their dependencies.

For the sake of the example, think of a list of database tables, which reference each other.  I need a way to export those tables in such a way, that when imported back in, tables that have dependencies will be imported after all those tables on which they depend.  (It’s not the real task I’m working on, but close enough.)

Consider the following list as an example of input data:

// List of items with dependencies.  Order is not important
$tables = [ 
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

The result of the dependency resolution should give me the list like this (there are variations in order, of course, as long as the dependencies are satisfied):

categories
options
tags
users
users_meta
articles
articles_meta
comments

There are several ways to solve this problem.  My first attempt took about 50 lines of code and worked fine, but it lacked elegance.  It had too many nested loops and tricky conditions and was difficult to read.  My second attempt was slightly better, with a bit of a recursion, but still looked somewhat off.  It felt like there is a better way to do it, and that I’ve done something similar before, but I could put my finger on it.

I thought I’d take a look at something that solves a similar problem.  Composer, PHP package and dependency manager, surely had something to offer.  A brief check of the GitHub repository, and that idea is out of my hand.  Composer deals with much more complex issues, so its Dependency Resolver code is not something I can grasp in a few minutes.

It was time for some Googling.  Moments later, my deja vu feeling of “I’ve seen this before” was easily explained.  This problem fits into the graph theory, which I probably used last back in my college years.  Of course, I could have grabbed the book off the shelf and refresh my knowledge, practicing the sacred art of the Real Programming.  But time was an issue, so I cheated.

I found this “Dependency resolving algorithm” blog post by Ferry Boender over at Electric Monk (thanks man!).  He had exactly what I needed – simple and straight forward recursive algorithm for walking the graph, circular dependency detection, and even some performance optimization.

dep_graph1

The only problem was that his code is all in Python.  But that’s not really a problem.  So I’ve rewritten his code in PHP and got exactly what I needed.  Here it is:

// List of items with dependencies.  Order is not important
$tables = [
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

$resolved = []; 
$unresolved = []; 
// Resolve dependencies for each table
foreach (array_keys($tables) as $table) {
    try {
        list ($resolved, $unresolved) = dep_resolve($table, $tables, $resolved, $unresolved);
    } catch (\Exception $e) {
        die("Oops! " . $e->getMessage());
    }   
}

// Print out result
foreach ($resolved as $table) {
    $deps = empty($tables[$table]) ? 'none' : join(',', $tables[$table]);
    print "$table (deps: $deps)\n";
}

/**
 * Recursive dependency resolution
 * 
 * @param string $item Item to resolve dependencies for
 * @param array $items List of all items with dependencies
 * @param array $resolved List of resolved items
 * @param array $unresolved List of unresolved items
 * @return array
 */
function dep_resolve($item, array $items, array $resolved, array $unresolved) {
    array_push($unresolved, $item);
    foreach ($items[$item] as $dep) {
        if (!in_array($dep, $resolved)) {
            if (!in_array($dep, $unresolved)) {
                array_push($unresolved, $dep);
                list($resolved, $unresolved) = dep_resolve($dep, $items, $resolved, $unresolved);
            } else {
                throw new \RuntimeException("Circular dependency: $item -> $dep");
            }
        }
    }
    // Add $item to $resolved if it's not already there
    if (!in_array($item, $resolved)) {
        array_push($resolved, $item);
    }
    // Remove all occurrences of $item in $unresolved
    while (($index = array_search($item, $unresolved)) !== false) {
        unset($unresolved[$index]);
    }

    return [$resolved, $unresolved];
}

Running the above code produces the following result:

$ php dependecy.php 
users (deps: none)
categories (deps: none)
tags (deps: none)
articles (deps: users,categories,tags)
articles_meta (deps: articles)
comments (deps: users,articles)
options (deps: none)
users_meta (deps: users)

Which is exactly what I was looking for.  And now that I have it here, I’ll probably be needing it again and again.  It’s an elegant hammer to a lot of my nails.

Using Ansible to bootstrap an Amazon EC2 instance

This article – “Using Ansible to Bootstrap My Work Environment Part 4” is pure gold for anyone trying to figure out all the moving parts needed to automate the provisioning and configuration of the Amazon EC2 instance with Ansible.

Sure, some bits are easier than the other, but it takes time to go from one step to another.  In this article, you have everything you need, including the provisioning Ansible playbook and variables, cloud-init bits, and more.

I’ve printed and laminated my copy.  It’s on the wall now.  It will provide me with countless hours of joy during the upcoming Christmas season.

Monitoring the monitoring : keeping Zabbix server service up

After our recent MySQL migrations, I started getting a weird issue – Zabbix server process was crashing periodically (several times a day).

8395:20161109:175408.023 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
8395:20161109:175408.024 [Z3001] connection to database 'zabbix_database_name_here' failed: [2003] Can't connect to MySQL server on 'zabbix_database_host_here' (111)
8395:20161109:175408.024 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...

Digging around for a bit, it seems like a widely reported issue, related Zabbix server using the same database connection as one of its agents is monitoring (here is an example bug report).

Not having enough time to troubleshoot and fix it properly, I decided for the time being to use another monitoring tool – monit – to keep an eye on the Zabbix server process and restart it, if it’s down.  After “yum install monit“, the following was dropped into /etc/monit.d/zabbix:

check process zabbix_server with pidfile /var/run/zabbix/zabbix_server.pid
    start program = "/sbin/service zabbix-server start" with timeout 60 seconds
    stop program = "/sbin/service zabbix-server stop"

Start the monit service, make sure it also starts at boot, and watch it in action via the /var/log/monit:

[UTC Nov 20 20:49:18] error    : 'zabbix_server' process is not running
[UTC Nov 20 20:49:18] info     : 'zabbix_server' trying to restart
[UTC Nov 20 20:49:18] info     : 'zabbix_server' start: /sbin/service
[UTC Nov 20 20:50:19] info     : 'zabbix_server' process is running with pid 28941

The chances of both systems failing at once are slim, so I think this will buy me some time.

Fixing “InnoDB: Error: log file ./ib_logfile0 is of different size”

For the last few days I’ve been moving MySQL databases around at work.  Being a bit in a rush and overconfident (I have backups!),  I was simply detaching the /var/lib/mysql volume on one host (running Amazon AMI and MySQL) and attaching it to another host (running CentOS 7 and MariaDB).

It’s not surprising that I got this error: “InnoDB: Error: log file ./ib_logfile0 is of different size“.  Gladly, this ServerFault thread provided enough hints for me to solve the problem.  In a nutshell:

  1. Temporarily comment out the InnoDB log file size setting (e.g.: innodb_log_file_size = 64M) in /etc/my.cnf.
  2. Set innodb_fast_shutdown to 0 (read more).
  3. Restart the MySQL service once or twice.
  4. Uncomment the log file size setting.
  5. Set InnoDB fast shutdown back to default or remove it from your my.cnf altogether.
  6. Celebrate!

Knowing how little I learn from my own mistakes, I’m sure I’ll find this post useful in the future.