Top 29 books on Amazon from Hacker News comments

hacker-news-books

I came across this nice visualization of “Top 29 books ranked by unique users linking to Amazon in Hacker News comments“.

Amazon product links were extracted and counted from 8.3M comments posted on Hacker News from Oct 2006 to Oct 2015.

Most of these are, not surprisingly, on programming and design.  A few are on startups and business.  Some are on how to have a good life.  Which is a bit weird.

Dependency resolution with graphs in PHP

One of the projects I am working on at work presented an interesting problem.  I had a list of items with dependencies on one another and I needed to figure out the order in which to use those items, based on their dependencies.

For the sake of the example, think of a list of database tables, which reference each other.  I need a way to export those tables in such a way, that when imported back in, tables that have dependencies will be imported after all those tables on which they depend.  (It’s not the real task I’m working on, but close enough.)

Consider the following list as an example of input data:

// List of items with dependencies.  Order is not important
$tables = [ 
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

The result of the dependency resolution should give me the list like this (there are variations in order, of course, as long as the dependencies are satisfied):

categories
options
tags
users
users_meta
articles
articles_meta
comments

There are several ways to solve this problem.  My first attempt took about 50 lines of code and worked fine, but it lacked elegance.  It had too many nested loops and tricky conditions and was difficult to read.  My second attempt was slightly better, with a bit of a recursion, but still looked somewhat off.  It felt like there is a better way to do it, and that I’ve done something similar before, but I could put my finger on it.

I thought I’d take a look at something that solves a similar problem.  Composer, PHP package and dependency manager, surely had something to offer.  A brief check of the GitHub repository, and that idea is out of my hand.  Composer deals with much more complex issues, so its Dependency Resolver code is not something I can grasp in a few minutes.

It was time for some Googling.  Moments later, my deja vu feeling of “I’ve seen this before” was easily explained.  This problem fits into the graph theory, which I probably used last back in my college years.  Of course, I could have grabbed the book off the shelf and refresh my knowledge, practicing the sacred art of the Real Programming.  But time was an issue, so I cheated.

I found this “Dependency resolving algorithm” blog post by Ferry Boender over at Electric Monk (thanks man!).  He had exactly what I needed – simple and straight forward recursive algorithm for walking the graph, circular dependency detection, and even some performance optimization.

dep_graph1

The only problem was that his code is all in Python.  But that’s not really a problem.  So I’ve rewritten his code in PHP and got exactly what I needed.  Here it is:

// List of items with dependencies.  Order is not important
$tables = [
    'articles_meta' => ['articles'],
    'articles' => ['users', 'categories', 'tags'],
    'categories' => [],
    'comments' => ['users', 'articles'],
    'options' => [],
    'tags' => [],
    'users' => [],
    'users_meta' => ['users'],
];

$resolved = []; 
$unresolved = []; 
// Resolve dependencies for each table
foreach (array_keys($tables) as $table) {
    try {
        list ($resolved, $unresolved) = dep_resolve($table, $tables, $resolved, $unresolved);
    } catch (\Exception $e) {
        die("Oops! " . $e->getMessage());
    }   
}

// Print out result
foreach ($resolved as $table) {
    $deps = empty($tables[$table]) ? 'none' : join(',', $tables[$table]);
    print "$table (deps: $deps)\n";
}

/**
 * Recursive dependency resolution
 * 
 * @param string $item Item to resolve dependencies for
 * @param array $items List of all items with dependencies
 * @param array $resolved List of resolved items
 * @param array $unresolved List of unresolved items
 * @return array
 */
function dep_resolve($item, array $items, array $resolved, array $unresolved) {
    array_push($unresolved, $item);
    foreach ($items[$item] as $dep) {
        if (!in_array($dep, $resolved)) {
            if (!in_array($dep, $unresolved)) {
                array_push($unresolved, $dep);
                list($resolved, $unresolved) = dep_resolve($dep, $items, $resolved, $unresolved);
            } else {
                throw new \RuntimeException("Circular dependency: $item -> $dep");
            }
        }
    }
    // Add $item to $resolved if it's not already there
    if (!in_array($item, $resolved)) {
        array_push($resolved, $item);
    }
    // Remove all occurrences of $item in $unresolved
    while (($index = array_search($item, $unresolved)) !== false) {
        unset($unresolved[$index]);
    }

    return [$resolved, $unresolved];
}

Running the above code produces the following result:

$ php dependecy.php 
users (deps: none)
categories (deps: none)
tags (deps: none)
articles (deps: users,categories,tags)
articles_meta (deps: articles)
comments (deps: users,articles)
options (deps: none)
users_meta (deps: users)

Which is exactly what I was looking for.  And now that I have it here, I’ll probably be needing it again and again.  It’s an elegant hammer to a lot of my nails.

Code is poetry. But not literature.

WordPress has been known, among other things, for coining the phrase “code is poetry”.  It’s used in the footer of their website as well as quite a bit around the web.  This goes along with Donald Knuth paradigm of Literate Programming.

Turns out, some people disagree.  Most recently, Peter Seibel, the author of “Coders at Work” book of interviews, wrote this blog post titled “Code is not literature”.

It was sometime after that presentation that I finally realized the obvious: code is not literature. We don’t read code, we decodeit. We examine it. A piece of code is not literature; it is a specimen.

It’s an interesting read, filled with quotes and references to some of the smartest people in IT and Computer Science.

The Definitive Guide to Natural Language Processing

The Definitive Guide to Natural Language Processing” is an easy to follow article on what a challanging task it is for machines to understand human language.  There’s also this cool video of two bots talking to each other.

27 languages to improve your Python

Nick Coghlan writes:

One of the things we do as part of the Python core development process is to look at features we appreciate having available in other languages we have experience with, and see whether or not there is a way to adapt them to be useful in making Python code easier to both read and write. This means that learning another programming language that focuses more specifically on a given style of software development can help improve anyone’s understanding of that style of programming in the context of Python.

To aid in such efforts, I’ve provided a list below of some possible areas for exploration, and other languages which may provide additional insight into those areas.

The languages and areas are:

  • Procedural programming: C, Rust, Cython
  • Object-oriented data modelling: Java, C#, Eiffel
  • Object-oriented C derivatives: C++, D
  • Array-oriented data processing: MATLAB/Octave, Julia
  • Statistical data analysis: R
  • Computational pipeline modelling: Haskell, Scala, Clojure, F#
  • Event driven programming: JavaScript, Go, Erlang, Elixir
  • Gradual typing: TypeScript
  • Dynamic metaprogramming: Hy, Ruby
  • Pragmatic problem solving: Lua, PHP, Perl
  • Computational thinking: Scratch, Logo

Free Data Science Books

I came across a collection of free data science books:

Pulled from the web, here is a great collection of eBooks (most of which have a physical version that you can purchase on Amazon) written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.

Most notably, there are introductory books, handbooks, Hadoop guide, SQL books, social media data mining stuff, and d3 tips and tricks.  There’s also plenty on artificial intelligence and machine learning, but that’s too far out for me.