Data Gravity

On the drive back home today I was listening to DevOps Cafe podcast, episode 59.  I’ve recently subscribed to this show and I think this was the first episode of it I ever heard.  It’s one of many tech talk podcasts, where two or more people chat for a varied period of time on a selection of topics, mostly related to technology.

In this particular episode, program hosts John and Damon were interviewing the CTO of BashoDave McCrory.  I wasn’t familiar with either Basho or Dave prior to the episode.  Gladly, a somewhat lengthy introduction by Dave gave me a good idea who he is.  What followed though was way more interesting – a discussion about data.

To be completely honest with you, I haven’t even finished the episode yet (got home right in the middle of it), but I feel like it’s one of those worth blogging about.  For one, I’ve learned a new term – “data lake”.  Apparently, that’s a new and fancy way of branding “data warehousing”.  Here is a bit from TechTarget, for example:

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.

While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.

The term data lake is often associated with Hadoop-oriented object storage.

But that was just the beginning.  What followed was a fascinating discussion on Data Gravity.  Obviously, this whole thing is too fresh in my mind and I can’t formulate it well yet, so I suggest you listen to the episode and read the intro on the Data Gravity site.  For the sake of brevity:

[…] it’s also a misleading term. Behind it all is the notion that data which is near other data is more useful, and the tendency of data to cling together comes from the usefulness of the resulting knowledge. […]

A lot of it seems obvious, but here it’s all put into a nice thought framework, with references to other, more established fields, like math and physics.  Easily one of the most interesting technology related discussions I’ve heard in a while!

awesome courses – list of awesome university courses for learning Computer Science!

awesome courses – list of awesome university courses for learning Computer Science!

While there were quite a few of these from before, this one is a really good selection.  I’m currently going through the slides of the Cloud Computing course from Cornell University.