Metabase – Open Source business intelligence and analytics

Metabase is an Open Source business intelligence and analytics tool.  It supports a variety of databases and services as sources for data, and provides a number of data querying and processing tools.  Have a look at the GitHub repository as well.

And if you want a few alternatives or complimenting tools, I found this list quite useful.

Why Uber Engineering Switched from Postgres to MySQL

Why Uber Engineering Switched from Postgres to MySQL” is an interesting study with plenty of technical detail of how MySQL was a better choice than PostgreSQL for the very demanding growth of Uber.  These kinds of issues are probably way out of scope for any “regular Joe” application, but the insight into the differences of MySQL and PostgreSQL architectures is still useful.

Main PostgreSQL limitations covered by the study are:

  • Inefficient architecture for writes
  • Inefficient data replication
  • Issues with table corruption
  • Poor replica MVCC support
  • Difficulty upgrading to newer releases

forget-db – a simple GDPR inspired tool to anonymise confidential database data

forget-db:

A simple(ish) command line tool written in PHP 7.1 using Laravel Zero and Faker to help you anonymise/pseudonymise data within your database to support protecting either sensitive information, or peoples right to be forgotten with GDPR compliance.

The tool allows you to connect to either mysql, postgres, sqlite or sqlserver and replace defined information with random data to allow you to keep statistics/relationships/audit of actions etc.

It uses a simple yaml configuration file to define the conditions for overwriting, which fields you want to overwrite, and what to overwrite them with.

When and where to determine the ID of an entity

It always amazes me when I randomly come across an article or a blog post precisely on the subject that I’m mulling over in my head – all without searching specifically for the solution or even researching the problem domain.  It’s almost like the universe knows what I’m thinking and sends help my way.

When and where to determine the ID of an entity” is an example of exactly that.  Lately, I’ve been working with events in CakePHP a lot.  And for one particular scenario, I was considering the beforeSave() event in the model layer, which would trigger some functionality that modifies data in other models.  So, having a reference of the current ID would be useful for debugging and logging purposes.  But since the current entity hasn’t been saved it, the ID is not there.   And that’s where I started thinking about this whole thing and considering where is the right place to generate the ID.

One thing that kind of bothered me on top of the theoretical discussion, was the practical implementation, especially in different frameworks.  If I remember correctly, the earlier version of CakePHP framwork, used the presence or absense of the ID in the entity to differentiate between insert and update operations.  It might still be true now, but at least there is a way to work around it, as CakePHP now has isNew() method to check if the entity needs to be inserted or updated.

 

Database Flow – modern, self-hosted web interface for SQL and GraphQL

Database Flow is a modern, Open Source, self-hosted, web-based tool for working with SQL databases and GraphQL APIs.  It supports a variety of the database engines: IBM DB2, Oracle, H2, PostgreSQL, MySQL, SQLite, Informix, and Microsoft SQL Server.  It features an advanced SQL editor, query plan analyzer, GraphQL client, schema explorer, charting, query history, and more.

The only visible downside so far is that it’s written in Java.

ORM: Active Record vs. Data Mapper

Everybody building a web application with a modern framework, is already probably using an ORM (Object-Relational Mapping).  Most frameworks include one out of the box.  But digging deeper into the subject, ORMs do vary from each other, and some cases, very significantly.

Most variations are coming from two main approaches: Active Record and Data Mapper.  I’ve heard the terms for a long time, but today decided to look into the meaning and the actual difference.

The two approaches seem very similar.  The difference is described in a multitude of articles online.  I particularly liked this one.  In essence, Active Record is a better choice for simpler, CRUD-based applications.  Data Mapper, on the other hand, is better for domain-specific applications, as it provides another level of abstraction between the domain objects and the persistence layer.

Most of my work these days is done with CakePHP framework, which I now thought uses the Active Record pattern.  But it turns out that CakePHP ORM so powerful, because it’s more than just one of those:

The CakePHP ORM borrows ideas and concepts from both ActiveRecord and Datamapper patterns. It aims to create a hybrid implementation that combines aspects of both patterns to create a fast, simple to use ORM.

It looks like I need to do some learning and dig deeper into the subject.  Pointers are welcome.

MySQL 8.0 release

MySQL 8.0 has been released and it brings the following new features, enhancements, and more:

  1. SQL Window functions, Common Table Expressions, NOWAIT and SKIP LOCKED, Descending Indexes, Grouping, Regular Expressions, Character Sets, Cost Model, and Histograms.
  2. JSON Extended syntax, new functions, improved sorting, and partial updates. With JSON table functions you can use the SQL machinery for JSON data.
  3. GIS Geography support. Spatial Reference Systems (SRS), as well as SRS aware spatial datatypes,  spatial indexes,  and spatial functions.
  4. Reliability DDL statements have become atomic and crash safe, meta-data is stored in a single, transactional data dictionary. Powered by InnoDB! 
  5. Observability Significant enhancements to Performance Schema, Information Schema, Configuration Variables, and Error Logging.
  6. Manageability Remote management, Undo tablespace management, and new instant DDL.
  7. Security OpenSSL improvements, new default authentication, SQL Roles, breaking up the super privilege, password strength, and more.
  8. Performance InnoDB is significantly better at Read/Write workloads, IO bound workloads, and high contention “hot spot” workloads. Added Resource Group feature to give users an option optimize for specific workloads on specific hardware by mapping user threads to CPUs.

Distributed architecture concepts I learned while building a large payments system

Gergely Orosz, an engineer who worked at Uber on the large scale payments system used by the company, shares some of the distributed architecture concepts he had to learn in the blog post titled “Distributed architecture concepts I learned while building a large payments system“.

The article is very well written and easy to follow. But it’s also a goldmine of links to other resources on the subject.  Here’s a list links and concepts for a quick research and/or click-through later:

Registry of Open Data on AWS

AWS News Blog covers the Registry of Open Data on AWS:

Almost a decade ago, my colleague Deepak Singh introduced the AWS Public Datasets in his post Paging Researchers, Analysts, and Developers. I’m happy to report that Deepak is still an important part of the AWS team and that the Public Datasets program is still going strong!

Today we are announcing a new take on open and public data, the Registry of Open Data on AWS, or RODA. This registry includes existing Public Datasets and allows anyone to add their own datasets so that they can be accessed and analyzed on AWS.

Currently, there are 53 data sets in the registry.  Each provides a tonne of data.  Subjects vary from satellite imagery and weather monitoring to political and financial information.

Hopefully, this will grow and expand with time.