MySQL, JSON, indexing and generated columns

For quite some time now I wanted to play around with the recently added JSON type in MySQL.  Finally, I have a project where MySQL version is high enough to support it, and the requirements are such that this choice makes sense.

The first impression was great – JSON type is basically LONGTEXT type with a bunch of added functionality to manipulate JSON data.  It took no time to setup tables and necessary queries to work with it.

The second iteration though raised a few questions.  Large tables, with complex JSON structures were rather slow in some of the more complex queries.  The first solution to look at was obviously indexes.  Turns out, MySQL does not support indexing of the JSON fields. Bummer.

But there is a rather elegant work around.  It involves another recently added feature, of which I haven’t heard about until today – GENERATED columns.  Think of table views, but on the column level, not table level.  And generated columns can be indexed.

In fact, there’s a whole lot that you can do with GENERATED columns in general, and JSON data in particular.  This blog post – “MySQL for JSON: Generated Columns and Indexing” – provides a great starting point with examples and explanations, including a scenario with the primary key of the table being a generated column, with the data from the JSON-typed column.


Calculating distance using MySQL

Calculating distance using MySQL” is a very useful blog post for everyone who works with geographical location data and MySQL. It shows a simple example of how to calculate the distance between two coordinates on a sphere (Earth in particular) within the MySQL itself.

SELECT ST_Distance_Sphere(
    point(-87.6770458, 41.9631174),
    point(-73.9898293, 40.7628267)

The above will return 1148978.6738241839, which is the distance between the two points in meters.

This functionality is available since MySQL 5.7. Have a look at the documentation of the spacial convenience functions.

MariaDB has similar functionality, but with a slightly different function names. Use ST_DISTANCE() instead of ST_Distance_Sphere(). Have a look at this blog post for more details.

Searching DynamoDB: An indexer sidecar for Elasticsearch

One thing that I like about the modern world is that large technology companies are a lot more open than they were in the previous century. Many of them contribute to the Open Source ecosystem and frequently share their wisdom on how to use and not to use a particular technology.

Have a look at the recent post from Bitbucket blog: Searching DynamoDB: An indexer sidecar for Elasticsearch, for example.

It’s not your usual marketing nonsense about introducing a new needless service or self-praising review of a product. It’s a rather deep dive into a technical topic that has been getting a lot of attention for the last few years – NoSQL databases. Not only the blog post itself is interesting, but it provides plenty of useful links to other resources. Like this one, which covers database partitioning in depth. Or this one, which lists some of the best practices for designing and using partition keys effectively.

I wish more companies shared their technical insights like this.

Dgraph – fast, transactional, distributed graph database

Dgraph is a fast, transactional distributed graph database, written in Go. It’s Open Source too.

If you need a quick introduction to graph databases or if you are wondering whether you need to use one, here’s a good video to get you started.

For even more insight, read “Why Google Needed a Graph Serving System“. There are some interesting examples of problems, solutions, and data discovery. For example:

Cerebro would often reveal very interesting facts that one didn’t originally search for. When you’d run a query like [us presidents], Cerebro would understand that presidents are humans, and humans have height. Therefore, it would allow you to sort presidents by height and show that Abraham Lincoln is the tallest US president. It would also let people be filtered by nationality. In this case, it showed America and Britain in the list, because US had one British president, namely George Washington. (Disclaimer: Results based on the state of KG at the time; can’t vouch for the correctness of these results.)