Validating JSON against schema in PHP

GitHub was rather slow yesterday, which affected the speed of installing composer dependencies (since most of them are hosted on GitHub anyway).  Staring at a slowly scrolling list of installed dependencies, I noticed something interesting.

...
  - Installing seld/jsonlint (1.6.0)
  - Installing justinrainbow/json-schema (5.1.0)
...

Of course, I’ve heard of the seld/jsonlint before.  It’s a port of zaach/jsonlint JavaScript tool to PHP, written by Jordi Boggiano, aka Seldaek, the genius who brought us composer dependency manager and packagist.org repository.

But JSON schema? What’s that?

The last time I heard the word “schema” in a non-database context, it was in the XML domain.  And I hate XML with passion.  It’s ugly and horrible and should die a quick death.  The sooner, the better.

But with all its ugliness, XML has does something right – it allows the schema definition, against which the XML file can be validated later.

Can I have the same with JSON?  Well, apparently, yes!

justinrainbow/json-schema package allows one to define a schema for what’s allowed in the JSON file, and than validate against it.  And even more than that – it supports both required values and default values too.

Seeing the package being installed right next to something by Seldaek, I figured, composer might be using it already.  A quick look in the repository confirmed my suspicion.  Composer documentation provides more information, and links to an even more helpful JSON-Schema.org.

Mind.  Officially.  Blown.

At work, we use a whole lot of configuration files for many of our projects.  Those files which are intended for tech-savvy users, are usually in JSON or PHP format, without much validation attached to them.   Those files which are for non-technical users, usually rely on even simpler formats like INI and CSV.  I see this all changing and improving soon.

But before any of that happens, I need to play around with these amazing tools.  Here’s a quick first look that I did:

  1. Install the JSON validator: composer require justinrainbow/json-schema
  2. Create an example config.json file that I will be validating.
  3. Create a simple schema.json file that defines what is valid.
  4. Create a simple index.php file to tie it altogether, mostly just coping code from the documentation.

My config.json file looks like this:

{
	"blah": "foobar",
	"foo": "bar"
}

My schema.json file looks like this:

{
	"type": "object",
	"properties": {
		"blah": {
			"type": "string"
		},
		"version": {
			"type": "string",
			"default": "v1.0.0"
		}
	}
}

And, finally, my index.php file looks like this:

<?php
require_once 'vendor/autoload.php';

use JsonSchema\Validator;
use JsonSchema\Constraints\Constraint;

$config = json_decode(file_get_contents('config.json'));
$validator = new Validator; $validator->validate(
	$config,
	(object)['$ref' => 'file://' . realpath('schema.json')],
	Constraint::CHECK_MODE_APPLY_DEFAULTS
);

if ($validator->isValid()) {
	echo "JSON validates OK\n";
} else {
	echo "JSON validation errors:\n";
	foreach ($validator->getErrors() as $error) {
		print_r($error);
	}
}

print "\nResulting config:\n";
print_r($config);

When I run it, I get the following output:

$ php index.php 
JSON validates OK

Resulting config:
stdClass Object
(
    [blah] => foobar
    [foo] => bar
    [version] => v1.0.0
)

What if I change my config.json to have something invalid, like an integer instead of a string?

{
	"blah": 1,
	"foo": "bar"
}

The validation fails with a helpful information:

$ php index.php 
JSON validation errors:
Array
(
    [property] => blah
    [pointer] => /blah
    [message] => Integer value found, but a string is required
    [constraint] => type
)

Resulting config:
stdClass Object
(
    [blah] => 1
    [foo] => bar
    [version] => v1.0.0
)

This is great. Maybe even beyond great!

The possibilities here are endless.  First of all, we can obviously validate the configuration files.  Secondly, we can automatically generate the documentation for the supported configuration options and values.  It’s probably not going to be super fantastic at first, but it will cover ALL supported cases and will always be up-to-date.  Thirdly, this whole thing can be taken to the next level very easily, since the schema files are JSON themselves, which means schema’s can be generated on the fly.

For example, in our projects, we allow the admin/developer to specify which database field of a table is used as display field (in links and such).  Only existing database fields should be allowed.  So, we can generate the schema with available fields on project deployment, and then validate the user configuration against his particular database setup.

There are probably even better ways to utilize all this, but I’ll have to think about it, which is not easy with the mind blown…

Update (March 16, 2017): also have a look at some alternative JSON Schema validators.  JSON Guard might be a slightly better option.

JSON API? No … HAL!

Wait, what?  That’s exactly what I said when I read this blog post.  I am still making my way through the JSON API specification.  And now it seems I might be wasting my time, as I should be learning HAL.

Whereas JSON API is almost like an “ORM over HTTP”, HAL does a lot less for you though, so it’s not really an apples-to-apples type of comparison.

HAL really is just a document format for a hypermedia API, like HTML is for hypertext. It doesn’t tell you how to express your domain model, and doesn’t really tell you how to use HAL to submit changes.

Sometime I think that I should just stop learning.  What’s the point?  By the time you learn a thing or two, it’s already obsolete and somebody somewhere has created something better, or wiser, or cheaper.

Meh…

WTF : The Inner JSON Effect

I’ve seen my share of horrible systems, but I haven’t seen anything this bad:

“So you have ‘customers.json’ and ‘customers.js’. The JSON file is the metadata and the JS file has all the code. So the list of functions in the JSON file tells JDSL to look up those revisions of the JS file to find what functions are available. In this case the actual code is in revisions 568, 899, 900, 901, and so on.”

Although I’ve seen a system before that breaks when adding code comments to certain files (as it was parsing source code with regular expressions, rather then with the language parser):

“Well, yes. I added a few code comments, trying to–”

“You can’t use comments in JDSL!” Tom shouted. “THAT’S WHAT BROKE IT!!”

Jake stayed silent, trying to process how code comments could wipe out a customer database. Tom continued after a pause. “I haven’t added comment support to JDSL, so the runtime executes comments like normal code! You must have had database updates in some comments?!”

“Well, yeah, I put a couple short syntax examples in a comment to clarify–”

Tom burst to his feet. “I knew it! You BROKE IT!” He turned to face the VPs. “I can’t deal with coders who don’t understand the system! You will either fire Jake…or I quit!” And he stormed out of the room.

Json Résumé – a community driven open source initiative to create a JSON based standard for résumés

Json Résumé – a community driven open source initiative to create a JSON based standard for résumés.

It’d be awesome to see LinkedIn integration with this.

jq – a lightweight and flexible command-line JSON processor

jq – a lightweight and flexible command-line JSON processor.

jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.