Top 29 books on Amazon from Hacker News comments

hacker-news-books

I came across this nice visualization of “Top 29 books ranked by unique users linking to Amazon in Hacker News comments“.

Amazon product links were extracted and counted from 8.3M comments posted on Hacker News from Oct 2006 to Oct 2015.

Most of these are, not surprisingly, on programming and design.  A few are on startups and business.  Some are on how to have a good life.  Which is a bit weird.

Dependency resolution with graphs in PHP

One of the projects I am working on at work presented an interesting problem.  I had a list of items with dependencies on one another and I needed to figure out the order in which to use those items, based on their dependencies.

For the sake of the example, think of a list of database tables, which reference each other.  I need a way to export those tables in such a way, that when imported back in, tables that have dependencies will be imported after all those tables on which they depend.  (It’s not the real task I’m working on, but close enough.)

Consider the following list as an example of input data:

// List of items with dependencies.  Order is not important
$tables = [ 
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

The result of the dependency resolution should give me the list like this (there are variations in order, of course, as long as the dependencies are satisfied):

categories
options
tags
users
users_meta
articles
articles_meta
comments

There are several ways to solve this problem.  My first attempt took about 50 lines of code and worked fine, but it lacked elegance.  It had too many nested loops and tricky conditions and was difficult to read.  My second attempt was slightly better, with a bit of a recursion, but still looked somewhat off.  It felt like there is a better way to do it, and that I’ve done something similar before, but I could put my finger on it.

I thought I’d take a look at something that solves a similar problem.  Composer, PHP package and dependency manager, surely had something to offer.  A brief check of the GitHub repository, and that idea is out of my hand.  Composer deals with much more complex issues, so its Dependency Resolver code is not something I can grasp in a few minutes.

It was time for some Googling.  Moments later, my deja vu feeling of “I’ve seen this before” was easily explained.  This problem fits into the graph theory, which I probably used last back in my college years.  Of course, I could have grabbed the book off the shelf and refresh my knowledge, practicing the sacred art of the Real Programming.  But time was an issue, so I cheated.

I found this “Dependency resolving algorithm” blog post by Ferry Boender over at Electric Monk (thanks man!).  He had exactly what I needed – simple and straight forward recursive algorithm for walking the graph, circular dependency detection, and even some performance optimization.

dep_graph1

The only problem was that his code is all in Python.  But that’s not really a problem.  So I’ve rewritten his code in PHP and got exactly what I needed.  Here it is:

// List of items with dependencies.  Order is not important
$tables = [
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

$resolved = []; 
$unresolved = []; 
// Resolve dependencies for each table
foreach (array_keys($tables) as $table) {
    try {
        list ($resolved, $unresolved) = dep_resolve($table, $tables, $resolved, $unresolved);
    } catch (\Exception $e) {
        die("Oops! " . $e->getMessage());
    }   
}

// Print out result
foreach ($resolved as $table) {
    $deps = empty($tables[$table]) ? 'none' : join(',', $tables[$table]);
    print "$table (deps: $deps)\n";
}

/**
 * Recursive dependency resolution
 * 
 * @param string $item Item to resolve dependencies for
 * @param array $items List of all items with dependencies
 * @param array $resolved List of resolved items
 * @param array $unresolved List of unresolved items
 * @return array
 */
function dep_resolve($item, array $items, array $resolved, array $unresolved) {
    array_push($unresolved, $item);
    foreach ($items[$item] as $dep) {
        if (!in_array($dep, $resolved)) {
            if (!in_array($dep, $unresolved)) {
                array_push($unresolved, $dep);
                list($resolved, $unresolved) = dep_resolve($dep, $items, $resolved, $unresolved);
            } else {
                throw new \RuntimeException("Circular dependency: $item -> $dep");
            }
        }
    }
    // Add $item to $resolved if it's not already there
    if (!in_array($item, $resolved)) {
        array_push($resolved, $item);
    }
    // Remove all occurrences of $item in $unresolved
    while (($index = array_search($item, $unresolved)) !== false) {
        unset($unresolved[$index]);
    }

    return [$resolved, $unresolved];
}

Running the above code produces the following result:

$ php dependecy.php 
users (deps: none)
categories (deps: none)
tags (deps: none)
articles (deps: users,categories,tags)
articles_meta (deps: articles)
comments (deps: users,articles)
options (deps: none)
users_meta (deps: users)

Which is exactly what I was looking for.  And now that I have it here, I’ll probably be needing it again and again.  It’s an elegant hammer to a lot of my nails.

Algorithm Economy and Containers

Containers (Docker, et al) have been getting all the hype recently.  I’ve played around with these a bit, but I’m not yet convinced this is the next greatest thing for projects that I am involved with currently.  However, it helps to look at these from different perspectives.  Here’s a blog post that ties containers to a new term that I haven’t heard before – algorithm economy.

The “algorithm economy” is a term established by Gartner to describe the next wave of innovation, where developers can produce, distribute, and commercialize their code. The algorithm economy is not about buying and selling complete apps, but rather functional, easy to integrate algorithms that enable developers to build smarter apps, quicker and cheaper than before.

CSS Colorguard – keep a watchful eye on your CSS colors

CSS Colorguard – keep a watchful eye on your CSS colors.

Here is a better description from the README:

Every CSS project starts out with good intentions, but inevitably, one too many people eye-dropper colors into nooks and crannies that you never knew existed. CSS Colorguard helps you maintain the color set that you want, and warns you when colors you’ve added are too similar to ones that already exist. Naturally, it’s all configurable to your tastes.

And here is my favorite part:

Colorguard uses the CIEDE2000 algorithm to determine the similarity of each of the colors in your CSS file. This algorithm is quite complex, but is used in the broadcasting community as the best approximation of human ability to discern differences in color. RGB on the other hand, is pretty bad at representing differences in color purely based on the numerical difference of the hex values.

Luckily, someone else already implemented CIEDE2000, so I didn’t have to. Tight. Cause this thing is mathy as hell.

CIEDE2000 math