MySQL 8.0 release

MySQL 8.0 has been released and it brings the following new features, enhancements, and more:

  1. SQL Window functions, Common Table Expressions, NOWAIT and SKIP LOCKED, Descending Indexes, Grouping, Regular Expressions, Character Sets, Cost Model, and Histograms.
  2. JSON Extended syntax, new functions, improved sorting, and partial updates. With JSON table functions you can use the SQL machinery for JSON data.
  3. GIS Geography support. Spatial Reference Systems (SRS), as well as SRS aware spatial datatypes,  spatial indexes,  and spatial functions.
  4. Reliability DDL statements have become atomic and crash safe, meta-data is stored in a single, transactional data dictionary. Powered by InnoDB! 
  5. Observability Significant enhancements to Performance Schema, Information Schema, Configuration Variables, and Error Logging.
  6. Manageability Remote management, Undo tablespace management, and new instant DDL.
  7. Security OpenSSL improvements, new default authentication, SQL Roles, breaking up the super privilege, password strength, and more.
  8. Performance InnoDB is significantly better at Read/Write workloads, IO bound workloads, and high contention “hot spot” workloads. Added Resource Group feature to give users an option optimize for specific workloads on specific hardware by mapping user threads to CPUs.

Distributed architecture concepts I learned while building a large payments system

Gergely Orosz, an engineer who worked at Uber on the large scale payments system used by the company, shares some of the distributed architecture concepts he had to learn in the blog post titled “Distributed architecture concepts I learned while building a large payments system“.

The article is very well written and easy to follow. But it’s also a goldmine of links to other resources on the subject.  Here’s a list links and concepts for a quick research and/or click-through later:

Registry of Open Data on AWS

AWS News Blog covers the Registry of Open Data on AWS:

Almost a decade ago, my colleague Deepak Singh introduced the AWS Public Datasets in his post Paging Researchers, Analysts, and Developers. I’m happy to report that Deepak is still an important part of the AWS team and that the Public Datasets program is still going strong!

Today we are announcing a new take on open and public data, the Registry of Open Data on AWS, or RODA. This registry includes existing Public Datasets and allows anyone to add their own datasets so that they can be accessed and analyzed on AWS.

Currently, there are 53 data sets in the registry.  Each provides a tonne of data.  Subjects vary from satellite imagery and weather monitoring to political and financial information.

Hopefully, this will grow and expand with time.

SQLBolt – Learn SQL with simple, interactive exercises

SQLBolt is by far the best SQL tutorial that I’ve ever seen!  Yes, I know, it’s a very bold statement.  But I promise that it’s true.

With hundreds of books, videos, and other tutorials around, the problem of delivering the understanding of data management, databases, and SQL to regular people still hasn’t been sold.  But SQLBolt provides a giant leap forward in this area.

The tutorial starts from the very basics and gets progressively more and more advanced.  But this progression is divided into small, very focused chapters.  Each chapter provides a brief description of the concept, an example query for the concept, and a set of exercises.  The exercises are all interactive, so that you don’t have to install a database or get access to a real one, and you don’t have to trust yourself on correctly solving the tasks.  The interactive exercises system marks the problem as solved the moment you type in the correct query.

If you get stuck at any point with any particular exercises, just click on the Solution link nearby, and the tutorial will show you the correct answer.  I found this to be a perfect balance between forcing the reader to try things out, but without the annoying delays for those of us who like to skip ahead.

There is really no reason now for anybody at all to learn SQL.  SQLBolt is brilliant!

TOP 10 MySQL 8.0 features for DBAs & OPS

Here’s a list of the TOP 10 MySQL 8.0 features for DBAs and OPS, with some detailed explanations of what they are and links to more information.  The features covered are:

  1. Temporary Tables Improvements
  2. Persistent global variables
  3. No more MyISAM System Tables
  4. Reclaim UNDO space from large transactions
  5. UTF8 performance
  6. Removing Query Cache
  7. Atomic DDLs
  8. Faster & More Complete Performance Schema (Histograms, Indexes, …) and Information Schema
  9. ROLES
  10. REDO & UNDO logs encrypted if tablespace is encrypted

Encrypt MySQL data using AES technique

I came across this blog post from a while back, which demonstrates how to use AES encryption for the data in MySQL database.

INSERT into user (first_name, address) VALUES (AES_ENCRYPT('Obama', 'usa2010'),AES_ENCRYPT('Obama', 'usa2010'));
SELECT AES_DECRYPT(first_name, 'usa2010'), AES_DECRYPT(address, 'usa2010') from user;

This seems rather easy and straightforward (apart from a little calculation one needs to do for the VARBINARY field types).  The only thing that I’m concerned about is whether the encryption keys will be visible in the MySQL process list (as in “SHOW FULL PROCESSLIST“).

SQL Keys in Depth

SQL Keys in Depth is an excellent read if you want to brush up on your knowledge of database keys and how they affect the performance of your application.  For the laziest among you, here are the summary points, based on an extensive research of 60+ articles, StackOverflow questions and IRC discussions:

For each table:

  1. Identify and declare all natural keys.
  2. Create a <table_name>_id surrogate key of type uuid with default value uuid_generate_v1(). You can even mark it as a primary key if you like. Including the table name in this id makes joins simpler. It’s JOIN foo USING (bar_id) vs JOIN foo ON (foo.bar_id = bar.id). Do not expose this key to clients or anywhere outside the database.
  3. For “join tables” declare all foreign key columns as a single composite primary key.
  4. Add an artificial key if desired for use in a URL or anywhere else you want to share a reference to a row. Use a Feistel cipher or pg_hashids to conceal auto-incrementing integers.
  5. Mark foreign keys to surrogate UUIDs as ON UPDATE RESTRICT and to external artificial keys as ON UPDATE CASCADE. Use your own judgement for natural keys.

MySQL optimize, repair, and analyze

Years ago I had the following script running as a cron job, but then I lost it somewhere.  It took me a few minutes to find it again, but just in case I need it in the future, I’m saving it here.

#!/bin/bash
mysqlcheck --all-databases
mysqlcheck --all-databases -o
mysqlcheck --all-databases --auto-repair
mysqlcheck --all-databases --analyze

Found it here this time.

Change SQL mode for MariaDB in Fedora 27

After I upgraded my laptop to Fedora 27, I started experiencing some weird issues with most of the projects I am developing locally.  Trying to save anything into the database that involves dates, started throwing the following errors:

Error: SQLSTATE[22007]: Invalid datetime format: 1292 Incorrect datetime value: '2017-11-30T13:30:48+02:00' for column 'timestamp' at row 1

A quick look around showed that Fedora 27 ships MariaDB v10.2, while the previous distribution version shipped MariaDB v10.1. Digging through the changes between the two versions didn’t help much, even though there is slight mention of the related change.

SQL_MODE has been changed; in particular, NOT NULL fields with no default will no longer fall back to a dummy value for inserts which do not specify a value for that field.

StackOverflow is much more helpful, as always.  These two threads – one and two – in particular, explained the changes and suggested the fix.  I had to either fix the projects I was working on, or modify my local configuration to use the old SQL mode.  This thread provided some more details, so the final solution was adding the following to the /etc/my.cnf and restarting the MariaDB service:

[mysqld]
sql-mode="NO_ZERO_IN_DATE,NO_ZERO_DATE"

And now we are back to normal.

The Global Airport Database

Global Airport Database – Arash Partow

The Global Airport Database is a collection of data about all (???) airports in the world, big and small.  It covers a total of 9,300 airports worldwide.

That’s pretty interesting.  For example, how many airports do you think there are in Cyprus?  Obviously, there is the Larnaca International Airport and Paphos International Airport.  Then, there is one in Nicosia, which is not functioning since the island was divided.  And there is one in Acrotiri on the British military base.  Four, right?   Well, the Global Airport Database has a total of 7 (!!!) entries for the country of Cyprus:

LCCC:N/A:N/A:NICOSIA ACC/FIC:CYPRUS:000:000:000:U:000:000:000:U:00000:0.000:0.000
LCLK:LCA:LARNACA:LARNACA:CYPRUS:034:052:030:N:033:037:029:E:00003:34.875:33.625
LCNC:N/A:N/A:NICOSIA:CYPRUS:000:000:000:U:000:000:000:U:00000:0.000:0.000
LCPH:PFO:PAPHOS INTERNATIONAL:PAPHOS:CYPRUS:034:043:004:N:032:029:008:E:00013:34.718:32.486
LCRA:AKT:AKROTIRI:AKROTIRI:CYPRUS:034:035:025:N:032:059:016:E:00024:34.590:32.988
LCRO:N/A:N/A:EPISKOPI:CYPRUS:000:000:000:U:000:000:000:U:00000:0.000:0.000
LCRR:N/A:N/A:NICOSIA:CYPRUS:000:000:000:U:000:000:000:U:00000:0.000:0.000

Larnaca and Paphos are there. The Akrotiri one is there too. Then we have 3 records for Nicosia. And one more for Episkopi. Hmm …