Metabase is an Open Source business intelligence and analytics tool. It supports a variety of databases and services as sources for data, and provides a number of data querying and processing tools. Have a look at the GitHub repository as well.
And if you want a few alternatives or complimenting tools, I found this list quite useful.
“Why Uber Engineering Switched from Postgres to MySQL” is an interesting study with plenty of technical detail of how MySQL was a better choice than PostgreSQL for the very demanding growth of Uber. These kinds of issues are probably way out of scope for any “regular Joe” application, but the insight into the differences of MySQL and PostgreSQL architectures is still useful.
Main PostgreSQL limitations covered by the study are:
- Inefficient architecture for writes
- Inefficient data replication
- Issues with table corruption
- Poor replica MVCC support
- Difficulty upgrading to newer releases
It always amazes me when I randomly come across an article or a blog post precisely on the subject that I’m mulling over in my head – all without searching specifically for the solution or even researching the problem domain. It’s almost like the universe knows what I’m thinking and sends help my way.
“When and where to determine the ID of an entity” is an example of exactly that. Lately, I’ve been working with events in CakePHP a lot. And for one particular scenario, I was considering the beforeSave() event in the model layer, which would trigger some functionality that modifies data in other models. So, having a reference of the current ID would be useful for debugging and logging purposes. But since the current entity hasn’t been saved it, the ID is not there. And that’s where I started thinking about this whole thing and considering where is the right place to generate the ID.
One thing that kind of bothered me on top of the theoretical discussion, was the practical implementation, especially in different frameworks. If I remember correctly, the earlier version of CakePHP framwork, used the presence or absense of the ID in the entity to differentiate between insert and update operations. It might still be true now, but at least there is a way to work around it, as CakePHP now has isNew() method to check if the entity needs to be inserted or updated.
ipstack looks like an excellent IP geolocation service with a beautiful API. If you haven’t used anything except for the MaxMind GeoIP, give it a try. Their pricing is quite good, with 10,000 lookups per month going for free.
Database Flow is a modern, Open Source, self-hosted, web-based tool for working with SQL databases and GraphQL APIs. It supports a variety of the database engines: IBM DB2, Oracle, H2, PostgreSQL, MySQL, SQLite, Informix, and Microsoft SQL Server. It features an advanced SQL editor, query plan analyzer, GraphQL client, schema explorer, charting, query history, and more.
The only visible downside so far is that it’s written in Java.
Everybody building a web application with a modern framework, is already probably using an ORM (Object-Relational Mapping). Most frameworks include one out of the box. But digging deeper into the subject, ORMs do vary from each other, and some cases, very significantly.
Most variations are coming from two main approaches: Active Record and Data Mapper. I’ve heard the terms for a long time, but today decided to look into the meaning and the actual difference.
The two approaches seem very similar. The difference is described in a multitude of articles online. I particularly liked this one. In essence, Active Record is a better choice for simpler, CRUD-based applications. Data Mapper, on the other hand, is better for domain-specific applications, as it provides another level of abstraction between the domain objects and the persistence layer.
Most of my work these days is done with CakePHP framework, which I now thought uses the Active Record pattern. But it turns out that CakePHP ORM so powerful, because it’s more than just one of those:
The CakePHP ORM borrows ideas and concepts from both ActiveRecord and Datamapper patterns. It aims to create a hybrid implementation that combines aspects of both patterns to create a fast, simple to use ORM.
It looks like I need to do some learning and dig deeper into the subject. Pointers are welcome.
MySQL 8.0 has been released and it brings the following new features, enhancements, and more:
- SQL Window functions, Common Table Expressions, NOWAIT and SKIP LOCKED, Descending Indexes, Grouping, Regular Expressions, Character Sets, Cost Model, and Histograms.
- JSON Extended syntax, new functions, improved sorting, and partial updates. With JSON table functions you can use the SQL machinery for JSON data.
- GIS Geography support. Spatial Reference Systems (SRS), as well as SRS aware spatial datatypes, spatial indexes, and spatial functions.
- Reliability DDL statements have become atomic and crash safe, meta-data is stored in a single, transactional data dictionary. Powered by InnoDB!
- Observability Significant enhancements to Performance Schema, Information Schema, Configuration Variables, and Error Logging.
- Manageability Remote management, Undo tablespace management, and new instant DDL.
- Security OpenSSL improvements, new default authentication, SQL Roles, breaking up the super privilege, password strength, and more.
- Performance InnoDB is significantly better at Read/Write workloads, IO bound workloads, and high contention “hot spot” workloads. Added Resource Group feature to give users an option optimize for specific workloads on specific hardware by mapping user threads to CPUs.
Gergely Orosz, an engineer who worked at Uber on the large scale payments system used by the company, shares some of the distributed architecture concepts he had to learn in the blog post titled “Distributed architecture concepts I learned while building a large payments system“.
The article is very well written and easy to follow. But it’s also a goldmine of links to other resources on the subject. Here’s a list links and concepts for a quick research and/or click-through later:
- Service Level Agreements (SLAs).
- Availability / service uptime (in percentage of time a year)
- Accuracy (in percentage)
- Capacity (in requests per second)
- Latency (95% and 99%)
- Horizontal vs. vertical scaling
- Horizontal scaling is adding more machines, much preferred for distributed systems.
- Vertical scaling is upgrading machines to the more powerful ones.
- Data Durability (here‘s some more on the subject)
- Message Persistence and Durability
- Idempotency (here‘s some more on the different strategies)
- Sharding and Quorum
- The Actor Model
- Reactive Architecture
AWS News Blog covers the Registry of Open Data on AWS:
Almost a decade ago, my colleague Deepak Singh introduced the AWS Public Datasets in his post Paging Researchers, Analysts, and Developers. I’m happy to report that Deepak is still an important part of the AWS team and that the Public Datasets program is still going strong!
Today we are announcing a new take on open and public data, the Registry of Open Data on AWS, or RODA. This registry includes existing Public Datasets and allows anyone to add their own datasets so that they can be accessed and analyzed on AWS.
Currently, there are 53 data sets in the registry. Each provides a tonne of data. Subjects vary from satellite imagery and weather monitoring to political and financial information.
Hopefully, this will grow and expand with time.