data aggregation

Going Pro with Feedly

I’ve been a heavy user of RSS for years now. I’ve tried and used everything from custom built applications and scripts, to browser add-ons, to third-party services. Even this very blog’s archives are full migration and review articles form one tool to another. Here are a few links, if you are interested:

October 2004: Signed up with BlogLines
November 2005: Google Reader vs. Bloglines
July 2006: Returned to Bloglines
October 2006: Good bye, Bloglines. Hello, Google Reader
September 2010: The end of Bloglines
August 2012: BazQux Reader – RSS reader that supports comments
June 2013: Goodbye Google Reader
July 2013: Aggregating feeds isn’t all that simple
March 2013: Google Reader alternative quest
March 2014: Moving the RSS to Feedly
July 2014: A year without Google Reader

For the last 3 years, I’ve been using Feedly, which I like a lot. I’ve been thinking about going Pro for about a year now. Last week, I made the switch. Here’s why:

I do love the service and want to support it! After all, I’m spending at least an hour every day going through my feeds. Sometimes even more.
The Pro version removes the limit on the number of feeds and items in each feed. Not that I don’t have enough to read, but I don’t like the idea that I might be missing something.
The Pro version provides integrations and easier sharing to a variety of third-party services. The one that is most important for me is WordPress integration.
Their blog post about the upcoming changes to feed organization was the last drop – I WANT THAT!

Feedly constantly improves the user experience and brings new features. It works very stable – I think only remember one or two downtimes in the last three years. Their web interface is very handy and the mobile app works well too. They have plenty of browser add-ons to make things even easier.

All in all, it’s well worth $5 per month for me.

Data Gravity

On the drive back home today I was listening to DevOps Cafe podcast, episode 59. I’ve recently subscribed to this show and I think this was the first episode of it I ever heard. It’s one of many tech talk podcasts, where two or more people chat for a varied period of time on a selection of topics, mostly related to technology.

In this particular episode, program hosts John and Damon were interviewing the CTO of Basho – Dave McCrory. I wasn’t familiar with either Basho or Dave prior to the episode. Gladly, a somewhat lengthy introduction by Dave gave me a good idea who he is. What followed though was way more interesting – a discussion about data.

To be completely honest with you, I haven’t even finished the episode yet (got home right in the middle of it), but I feel like it’s one of those worth blogging about. For one, I’ve learned a new term – “data lake”. Apparently, that’s a new and fancy way of branding “data warehousing”. Here is a bit from TechTarget, for example:

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.

While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.

The term data lake is often associated with Hadoop-oriented object storage.

But that was just the beginning. What followed was a fascinating discussion on Data Gravity. Obviously, this whole thing is too fresh in my mind and I can’t formulate it well yet, so I suggest you listen to the episode and read the intro on the Data Gravity site. For the sake of brevity:

[…] it’s also a misleading term. Behind it all is the notion that data which is near other data is more useful, and the tendency of data to cling together comes from the usefulness of the resulting knowledge. […]

A lot of it seems obvious, but here it’s all put into a nice thought framework, with references to other, more established fields, like math and physics. Easily one of the most interesting technology related discussions I’ve heard in a while!

Extract, Transform, Load

I’ve been doing all kinds of data migrations and system integration for years now. But only yesterday I’ve learned that there is a very specific term linked to the process.

In computing, extract, transform, and load (ETL) refers to a process in database usage and especially in data warehousing that:

Extracts data from outside sources

Transforms it to fit operational needs, which can include quality levels

Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse)

ETL systems commonly integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing.

scraper.js – a complete and versatile web scraper

Tek Security Group’s Password Repository

In this repository you will find helpful authentication brute forcing files. These files include known password defaults, usernames, common and specialized dictionaries, etc.