InnoDB is better for SugarCRM

After SugarCRM was deployed, we were experiencing some lock ups.  Not frequent or dangerous, but annoying.  About once a week or every ten days or so, SugarCRM would lock up and won’t answer any queries at all.  Not even the login was possible.  A brief investigation showed that somehow it was locking up the MySQL database – about 15 processes (using “show full processlist”) in Locked state, with no data being sent back or forth.  All locked queries were rather complex, with several JOINs.  The load on the system was somewhat high, since we have about a few dozen operators working on it at the same time.

A similarly brief Google search suggested (see here and here) and explained converting MySQL tables from InnoDB to InnoDB.  A test has been performed and everything went OK.  Our SugarCRM database is about 600 MBytes and it was converted from MyISAM to InnoDB in under 20 minutes.  The best part is that it takes even less to convert back to MyISAM, in case you change your mind.

It’s been a few days now since we did the conversion and it looks OK.  Also, the CRM itself feels a bit faster.

BLOB is bad for your (mental) health

If you ever mention that your web application uses database to store files, you risk being flamed into oblivion.  Indeed, in most cases, it is a bad idea, since file system is more effecient when it comes to files.  However, there are cases when it makes sense to have files saved in the database.

Maybe I am doing something wrong, but in the last six month, I had to develop at least three systems that used MySQL for file storage (uploaded files that have to be synchronized across several hosts, etc).  Yesterday, for the third time I stumbled across the same problem, that almost drove me insane.

MySQL has four data types for storing binary data – TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB.  Somehow I always forget about these and use BLOB.  BLOB works just fine, but it has a limit on size, which is rather low – 64 KBytes.  The mean thing here is that it will work just fine with most of the test data – text files, short PDFs, and small pictures.  Once the application is tested and put into production, the corrupted files will start coming in.  Re-writing all parts that deal with uploading, moving, cleaning, escaping, and encrypting binary data takes time.  Going through file reading and writing routines is boring too, and it won’t help either.

By the time, the issue is discovered and all fields are changed to LONGBLOB, it is often very late, and you’ve lost your weekend, as well as a lot of large files. This post is an attempt to save my (and your) sanity.

Reminder: use LONGBLOB instead of BLOB for file storage, unless you are absolutely sure about the maximum size of incoming data.

Portability and flexibility win over performance

I noticed this ticket in WordPress TracChange enum to varchar and went in to see if there is any heated discussion.  The issue is around field types used in SQL scheme for WordPress tables.  Certain fields, such post status employed ENUM type with a set of allowed values.  The proposed change in the ticket is to convert them to VARCHAR type.

Why the change?  Well, VARCHAR is just a text field.  Anyone can put pretty much any string into it.  It has more flexibility for plugin developers and future changes – no need to tweak the SQL scheme.  ENUM on the other hand works a little bit faster.

Side note: I also thought that ENUM provides some extra data validation, assuming the ENUM field is set to NOT NULL, but it turns out this is not the case.  If you insert a record with a value which is not in the list, the NULL is used. 

The change has been approved, the patch was attached, and the world will see it in the next WordPress release.  Once again, it has been proven that human time is much more valuable than machine time.  Making it easier for plugin developers to extend and change the system has more value than that of a few extra CPU cycles to lookup in strings instead of numbers.

Open Source Contributors

With the recent news of Sun Microsystems buying MySQL AB for one billion dollars (insert Dr.Evil’s evil laugh here), I hear plenty calling Sun the largest contributor to open source. I beg to differ.

Sun is doing a lot for open source, there is no argument about it, and whatever they do is much appreciated. But calling them the largest contributor to open source, is a little bit too far fetched, I think. First of all, let’s see what we are talking about. Here is the list of open source stuff from Sun (according to their open source initiative page):

  • Solaris Enterprise System / OpenSolaris
  • Linux from Sun
  • StarOffice / OpenOffice.org
  • NetBeans
  • OpenSPARC
  • Java

That’s something, but doesn’t qualify for the number one contributor. First of all, these are mostly Sun’s own offerings. Secondly, some of these (Java and OpenSolaris) have been opened to be saved. They were open when it was pretty much obvious to everyone that if they are not, they aren’t going to last very long. Or, at least, they won’t prosper as they should. Thirdly, the effort that was put in some of these (StarOffice / OpenOffice.org) by Sun isn’t all that impressive. I mean, yeah, they bought and opened StarOffice. People jumped on it and started to improve it. And it improved a lot. But it’s still huge, bloated, and clunky, after all these years…

As I said, it’s still appreciated. There is plenty of good in Sun’s open source initiative. But I think there are companies that have done more good to open source than Sun did. I think that IBM did a great deal more. And it did it before anyone else, when open source needed help the most. Then, I think Google has done plenty and is still doing a lot. And, I think it’s not fair at all to forget Red Hat. These guys made a lot of money on open source software, but they were more than willing to share and invest those money back into the community.

The future of SQL

Slashdot lets us know that Google contributes code to MySQL.  Among the comments to that post, there is this one, which is while being rather funny holds some truth to it:

They need to add a GOOGLE function to allow queries to be searched nicer.

SELECT * FROM articles WHERE GOOGLE(‘boobies’);

something similar might be available but it is a PITA to list the fields to search and specify the operators etc

I think here lies the future of SQL…