Validating CSV schema

CSV, or comma-separated values, is a very common format for managing all kinds of configurations, as well data manipulation.  As the linked Wikipedia page mentions, there are a few RFCs that try to standardize the format.  However, I thought, there is still a lack of schema-type standard that would allow one to define a format for particular file.

Today I came across an effort that attempts to do just that – CSV Schema Language v1.1 – an unofficial draft of the language for defining and validating CSV data.  This is work in progress by the Digital Preservation team at The National Archives.

Apart from the unofficial draft of the language, there is also an Open Source CSV Validator v1.1 application, written in Scala.

Why Configuration Management and Provisioning are Different

In “Why Configuration Management and Provisioning are Different” Carlos Nuñez advocates for the use of specialized infrastructure provisioning tools, like Terraform, Heat, and CloudFormation, instead of relying on the configuration management tools, like Ansible or Puppet.

I agree with his argument for the rollbacks, but not so much for the maintaining state and complexity.  However I’m not yet comfortable to word my disagreement – my head is all over the place with clouds, and I’m still weak on the terminology.

The article is nice regardless, and made me look at the provisioning tools once again.

Validating JSON against schema in PHP

GitHub was rather slow yesterday, which affected the speed of installing composer dependencies (since most of them are hosted on GitHub anyway).  Staring at a slowly scrolling list of installed dependencies, I noticed something interesting.

...
  - Installing seld/jsonlint (1.6.0)
  - Installing justinrainbow/json-schema (5.1.0)
...

Of course, I’ve heard of the seld/jsonlint before.  It’s a port of zaach/jsonlint JavaScript tool to PHP, written by Jordi Boggiano, aka Seldaek, the genius who brought us composer dependency manager and packagist.org repository.

But JSON schema? What’s that?

The last time I heard the word “schema” in a non-database context, it was in the XML domain.  And I hate XML with passion.  It’s ugly and horrible and should die a quick death.  The sooner, the better.

But with all its ugliness, XML has does something right – it allows the schema definition, against which the XML file can be validated later.

Can I have the same with JSON?  Well, apparently, yes!

justinrainbow/json-schema package allows one to define a schema for what’s allowed in the JSON file, and than validate against it.  And even more than that – it supports both required values and default values too.

Seeing the package being installed right next to something by Seldaek, I figured, composer might be using it already.  A quick look in the repository confirmed my suspicion.  Composer documentation provides more information, and links to an even more helpful JSON-Schema.org.

Mind.  Officially.  Blown.

At work, we use a whole lot of configuration files for many of our projects.  Those files which are intended for tech-savvy users, are usually in JSON or PHP format, without much validation attached to them.   Those files which are for non-technical users, usually rely on even simpler formats like INI and CSV.  I see this all changing and improving soon.

But before any of that happens, I need to play around with these amazing tools.  Here’s a quick first look that I did:

  1. Install the JSON validator: composer require justinrainbow/json-schema
  2. Create an example config.json file that I will be validating.
  3. Create a simple schema.json file that defines what is valid.
  4. Create a simple index.php file to tie it altogether, mostly just coping code from the documentation.

My config.json file looks like this:

{
	"blah": "foobar",
	"foo": "bar"
}

My schema.json file looks like this:

{
	"type": "object",
	"properties": {
		"blah": {
			"type": "string"
		},
		"version": {
			"type": "string",
			"default": "v1.0.0"
		}
	}
}

And, finally, my index.php file looks like this:

<?php
require_once 'vendor/autoload.php';

use JsonSchema\Validator;
use JsonSchema\Constraints\Constraint;

$config = json_decode(file_get_contents('config.json'));
$validator = new Validator; $validator->validate(
	$config,
	(object)['$ref' => 'file://' . realpath('schema.json')],
	Constraint::CHECK_MODE_APPLY_DEFAULTS
);

if ($validator->isValid()) {
	echo "JSON validates OK\n";
} else {
	echo "JSON validation errors:\n";
	foreach ($validator->getErrors() as $error) {
		print_r($error);
	}
}

print "\nResulting config:\n";
print_r($config);

When I run it, I get the following output:

$ php index.php 
JSON validates OK

Resulting config:
stdClass Object
(
    [blah] => foobar
    [foo] => bar
    [version] => v1.0.0
)

What if I change my config.json to have something invalid, like an integer instead of a string?

{
	"blah": 1,
	"foo": "bar"
}

The validation fails with a helpful information:

$ php index.php 
JSON validation errors:
Array
(
    [property] => blah
    [pointer] => /blah
    [message] => Integer value found, but a string is required
    [constraint] => type
)

Resulting config:
stdClass Object
(
    [blah] => 1
    [foo] => bar
    [version] => v1.0.0
)

This is great. Maybe even beyond great!

The possibilities here are endless.  First of all, we can obviously validate the configuration files.  Secondly, we can automatically generate the documentation for the supported configuration options and values.  It’s probably not going to be super fantastic at first, but it will cover ALL supported cases and will always be up-to-date.  Thirdly, this whole thing can be taken to the next level very easily, since the schema files are JSON themselves, which means schema’s can be generated on the fly.

For example, in our projects, we allow the admin/developer to specify which database field of a table is used as display field (in links and such).  Only existing database fields should be allowed.  So, we can generate the schema with available fields on project deployment, and then validate the user configuration against his particular database setup.

There are probably even better ways to utilize all this, but I’ll have to think about it, which is not easy with the mind blown…

Update (March 16, 2017): also have a look at some alternative JSON Schema validators.  JSON Guard might be a slightly better option.

How To Use Git to Manage your User Configuration Files

There is probably a gadzillion different ways that you can manage and synchronize you configuration files (aka dotfiles) between different Linux/UNIX boxes – anything from custom symlink scripts, all the way to configuration management tools like Puppet and Ansible.  Here are a few options to look at if you are not doing it already.

Personally, I’m using Ansible and I’m quite happy with it, as it allows me to have multiple playbooks (base configuration, desktop configuration, development setup, etc), and do more things than just manage my configuration files (install packages and tools that I often need, setup correct permissions, and more).

Recently, I came across this tutorial from Digital Ocean on how to manage your configuration files with git.  Again, there are a few options discussed in there, as even with git, there’s more than one way to do it (TMTOWTDI).

The one that I’ve heard about a long time ago, but completely forgot, and which I think is quite elegant is the approach of separating the working directory from the git repository:

Now, we do things a bit differently. We will start by specifying a different working directory using the core.worktree git configuration option:

git config core.worktree "../../"

What this does is establish the working directory relative to the path of the .git directory. The first ../refers to the ~/configs directory, and the second one points us one step beyond that to our home directory.

Basically, we’ve told git “keep the repository here, but the files you are managing are two levels above the repo”.

I guess, if you stick purely to git, you can offload some of the additional processing, such as permission changes and package installation, into one of the git hooks.  Something like post-checkout or post-merge.

WordPress Plugin : WP-CFM – manage and deploy WordPress configuration changes

WP-CFM is a WordPress plugin which helps to manage and deploy WordPress configuration changes between different sites.  I haven’t tried it myself yet, but it looks super useful as it allows to separate the configuration options from the content, both of which are stored in the database.  The cherry on top here is the support for WP-CLI, command line interface to WordPress, which is frequently employed for automatically deploying WordPress to different servers and environments.

I have a feeling this plugin will be making its way into our project-template-wordpress setup pretty soon.

How to handle configuration in PHP

Kevin Schroeder has a blog post about the tool that he is building for configuration management in PHP.  The library is still in the early pre-release stage, but it looks like it solves quite a few problems related to configuration, like nesting, inheritance, and environment/context variation.

Here’s the YouTube video that provides a bit of introduction into how to use the tool, and what to expect of it.

The only thing that dials down my excitement in this implementation is the use of XML, even though I understand why he opted for this choice.

I will need a PHP configuration management solution soon, but the priority hasn’t been raised high enough yet for me to jump into the research.  If you know of any other similar tools, please let me know – it all will come handy pretty soon.

Checking out Ansible. Sorry Puppet

It’s Thursday evening of a particularly difficult week at work.  Tomorrow is a public holiday, effectively making this – a Friday.  My brain is blank and exhausted, so I can’t do anything productive.  And I’m too tired to go out.  But I can still learn a thing or two.

First things first – cancel the external noise.  I want something loud, but not too intensive, and with no words in it.  So this 2 hour blues instrumental collection comes in handy.  Start the playback, put the headphones on, and push the volume up.

Now.  Here’s something I wanted to look into for quite some time – Ansible configuration manager.

Continue reading “Checking out Ansible. Sorry Puppet”

Red Hat acquires Ansible

Linux Weekly News reports that Red Hat acquires Ansible.  There are quite a few configuration management tools around, and it was only the matter of time until Red Hat, with all its corporate client base, would buy one.  Or pledge allegiance.  My personal preference would be in Puppet, but Puppet comes from the Ruby world, where’s Red Hat is more of a Python shop.

Ansible’s simple and agentless approach, unlike competing solutions, does not require any special coding skills, removing some of the most significant barriers to automation across IT. From deployment and configuration to rolling upgrades, by adding Ansible to its hybrid management portfolio, Red Hat will help customers to:

  • Deploy and manage applications across private and public clouds.
  • Speed service delivery through DevOps initiatives.
  • Streamline OpenStack installations and upgrades.
  • Accelerate container adoption by simplifying orchestration and configuration.

The upstream Ansible project is one of the most popular open source automation projects on GitHub with an active and highly engaged community, encompassing nearly 1,200 contributors. Ansible automation is being used by a growing number of Fortune 100 companies, powering large and complex private cloud environments, and the company has received several notable accolades, including a 2015 InfoWorld Bossie Award, recognizing the best open source datacenter and cloud software.

Regardless, though, of my personal preferences, these are good news for configuration management and automation.

Fedora for short-lifespan server instances

Fedora for short-lifespan server instances

Let’s come back to the odd fact that Fedora is both a precursor to RHEL, and yet almost never used in production as a server OS. I think this is going to change. In a world where instances are deployed constantly, instances are born and die but the herd lives on. Once everyone has their infrastructure encoded into a configuration management system, Fedora’s short release cycle becomes much less of a burden. If I have service foo deployed on a Fedora X instance, I willnever be upgrading that instance. Instead I’ll be provisioning a new Fedora X+1 instance to run the foo service, start it, and throw the old instance in the proverbial bitbucket once the new one works.

Via LWN.