Extract, Transform, Load

I’ve been doing all kinds of data migrations and system integration for years now.  But only yesterday I’ve learned that there is a very specific term linked to the process.

In computing, extract, transform, and load (ETL) refers to a process in database usage and especially in data warehousing that:

  • Extracts data from outside sources
  • Transforms it to fit operational needs, which can include quality levels
  • Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse)

ETL systems commonly integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing.

Pulp – software repository management.

Pulp is a platform for managing repositories of content, such as software packages, and pushing that content out to large numbers of consumers. If you want to locally mirror all or part of a repository, host your own content in a new repository, manage content from multiple sources in one place, and push content you choose out to large numbers of clients in one simple operation, Pulp is for you!