Portability and flexibility win over performance

I noticed this ticket in WordPress TracChange enum to varchar and went in to see if there is any heated discussion.  The issue is around field types used in SQL scheme for WordPress tables.  Certain fields, such post status employed ENUM type with a set of allowed values.  The proposed change in the ticket is to convert them to VARCHAR type.

Why the change?  Well, VARCHAR is just a text field.  Anyone can put pretty much any string into it.  It has more flexibility for plugin developers and future changes – no need to tweak the SQL scheme.  ENUM on the other hand works a little bit faster.

Side note: I also thought that ENUM provides some extra data validation, assuming the ENUM field is set to NOT NULL, but it turns out this is not the case.  If you insert a record with a value which is not in the list, the NULL is used. 

The change has been approved, the patch was attached, and the world will see it in the next WordPress release.  Once again, it has been proven that human time is much more valuable than machine time.  Making it easier for plugin developers to extend and change the system has more value than that of a few extra CPU cycles to lookup in strings instead of numbers.

RegExp reminder

I was just reminded about this small thing, which is so easy to forget – regular expressions that have markers of line start (^) and/or line end($) are so much faster than those regexps that don’t have these markers. The thing is that with line start/end marker regexp engine needs to make only one match/substution, whereas when there is no such markers, it has to repeat the match/substitution operation at every character of the string.

In practice, it’s unbelievable how much difference this can make. Especially when using complex regular expressions over large data sets.

P.S.: I understand that it is not always possible to use these markers, but I think that they can be used much more often than they are. Everywhere.