BLOB is bad for your (mental) health

If you ever mention that your web application uses database to store files, you risk being flamed into oblivion.  Indeed, in most cases, it is a bad idea, since file system is more effecient when it comes to files.  However, there are cases when it makes sense to have files saved in the database.

Maybe I am doing something wrong, but in the last six month, I had to develop at least three systems that used MySQL for file storage (uploaded files that have to be synchronized across several hosts, etc).  Yesterday, for the third time I stumbled across the same problem, that almost drove me insane.

MySQL has four data types for storing binary data – TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB.  Somehow I always forget about these and use BLOB.  BLOB works just fine, but it has a limit on size, which is rather low – 64 KBytes.  The mean thing here is that it will work just fine with most of the test data – text files, short PDFs, and small pictures.  Once the application is tested and put into production, the corrupted files will start coming in.  Re-writing all parts that deal with uploading, moving, cleaning, escaping, and encrypting binary data takes time.  Going through file reading and writing routines is boring too, and it won’t help either.

By the time, the issue is discovered and all fields are changed to LONGBLOB, it is often very late, and you’ve lost your weekend, as well as a lot of large files. This post is an attempt to save my (and your) sanity.

Reminder: use LONGBLOB instead of BLOB for file storage, unless you are absolutely sure about the maximum size of incoming data.

Whiteboard wins

The other day we ordered a large whiteboard for our office.  The board arrived some time later, complete with a bunch of whiteboard markers and whiteboard eraser.  Gladly we put it up and started writing our plan for the world domination.

A few moments later, when we tried to do some corrections, we realized that we can’t really eraze much from the whiteboard. Hmmm.

Marker theory check.  Are all of them marked as “whiteboard markers”?  Yes.

Marker practice check.  We tried to write something with each one of them and then tried to delete it.  Only greek could have been erased easily.  Turned out that four markers (black, blue, red, and green) were from a total of three different brands.  Red and blue were from the same maker.

Because we were rather pressed on time, we covered the whole whiteboard with green text and diagrams.  Then we called the bookshop and asked to bring us more markers of the same brand with green.  People in the bookshop were rather puzzled by the request, but confirmed that we will receive more markers the next day.

The guy that brought the markers tested them on the board and saw that they could have been easily erased.  Then he tried the other ones and saw that it was almost impossible to eraze them.  Then he asked for a knife.

It was our turn to feel puzzled and confused, but we found a knife for him.

… five seconds later, it was our turn to feel really stupid.  Apparently, the whiteboard was covered with transparent plastic film to protect its surface.  It was absolutely invisible and looked and felt exactly like the whiteboard surface itself.  Once the film was peeled off, the new shiny surface of the whiteboard was revealed.   And, of course, all whiteboard markers – old and new – could be used normally.  We tested them all and we could eraze everything easily.  The magic moment!

I would like to take this opportunity and thank the guy from the bookshop, who solved a big problem of ours, and … didn’t laugh in our face, like many would do in a similar situation (tech support stories anyone?).  As a matter of fact, he didn’t even smile.  I bet he had a blast once he left our offices, but that doesn’t matter, because it was, indeed, funny.

Programming language barrier

One of the frequent things that I hear about programmers is that it doesn’t matter which language the person is using and which language you need him to use, because if he is any good he’ll learn and catch up pretty fast.  In other words, if you take a decent Java programmer and push him to write PHP code for you, you’ll only have issues for a few days.  Or weeks, at most.

I understand the reasons for this statement, but I don’t agree with it.  At least not completely.

Firstly, the reasons.  They are rather obvoius, but I’d rather stagte them anyway.  Computer Science is not specific to any programming language.  The concepts and approaches are more or less the same everywhere.  Flow control, data structures, and algorithms are not language specific.  Each language has its own best practices and recommended variations, but a bubble sort in PHP will be very similar to bubble sort in Java.   Then you need some common sense, which is also not laguage bound at all.

Secondly, the disagreement.  I think that the Computer Science theory and common sense aren’t the only things that make up a programmer.  What makes a lot of difference is experience.  Programming languages, in their practical applicatoin, are just collections of software – compilers, linkers, debuggers, libraries, IDEs, etc.  Like any other software, programming language software has bugs, undocumented features, and Days When Things Don’t Just Work.  It’s the experience with the language that teaches the programmer how to handle the issues of each software piece.  And that experience is priceless (almost).

Even if you’d manage to push a Java programmer into writing PHP code, that would a waste of resources.  A Java programmer is a Java programmer, not PHP programmer.  He will, of course, learn PHP nuances with time, but, he’ll probably lose a part of his priceless (almost) bagage.  Sounds a lot like misuse of resources.

Another part of my disagreement is not so much reasoned as emotionalized.  I’ve seen a few C and Java developers switch to Perl and PHP for their new positions.  Not that I was forcing them to or anything, but they did.  And the switch was moslty painful to say the least.  Here are some of the areas that I noticed as being hard to comprehend.

Compiling vs. interpreting. Those people who were used to their compilation process were missing something for the first few days.  Some needed as much as a week to adopt, even though write-save-reload browser was done a few hundred times a day.

Debugging. There are two major camps here.  In the first one are all those people who live in the debugger.  They know all the keyboard shortcuts and they have their highlighting customized.  In another camp are people of the simpler nature, those who use print() and die() for most of their debugging needs.  It seems that most people coming from C and Java prefer the debugger way.  Most of the interpretted languages do have either a standalone debugger or a built in debugging tool, but it seems that the majority of interpretted language crowd use the print() and die() approach.

Sigils. If you don’t know what a sigil is, read this Wikipedia page.  Because you do know what it is.  Many strong type language don’t use any sigils.  Most of the loosely typed languages do.  Furthermore, when both the language from which you are changing and the language to which you are changing use sigils, chances are there will still be a difference.  PHP, for example, uses $ for both scalars and arrays.  In Perl though, you’ll get a $ for scalar, @ for array, and % for hash.  Perl’s sigils are extremely helpful when figuring out someone else’s code. I remember the pain of having just a $ in PHP, when I was learning it.  And I can’t even imagine how confusing it is for people who are used to non-sigilized programming languages.

Types. As already mentioned above, strong typed language programmers can be often confused with the fact that variables can change their type on the fly, and that they don’t even need to be declared before use.  Loosely typed language programmers will often complain about the requirement to define their types.  Three of the most common questions that I’ve heard regarding this matter were:

  • “How do I define an array of elements of a certain type of a certain length?”
  • “Is this line a piece of non-sense or does it really do something:   $sum += 0; ?”
  • “What’s wrong with writing:  int amount; amount = 2.5; ?”

There are, of course, more areas than just those – include pathes, include files, OOP, database abstraction, loops (“What the heck is foreach?”), memory management, libraries, and so on and so forth.

Even the list of the resources for each programming language takes time to build.   Yes, time.  And time is one thing that’s always against us.  Everything else we ca handle.

Can you handle the popularity?

Looking around the blogosphere, I see more and more bloggers who work really hard on promoting their sites.  They optimize their themes for Google, submit blog to all sorts of directories, share links to their best content via social networks, microblog, and comment all around the web.
Well, that’s all fine.  But here is the questions – can they handle the popularity?

I’ve been thinking about it before, but it came all to me suddenly yesterday and today.  One of my recent posts got submitted to reddit.com and it somehow it went through to the main page of the site, and from there got aggregated via RSS to a lot of other places.  Within 24 hours, my blog received more than 20,000 views.  Compared to an average day, which brings much under a thousand, that’s a lot.

This sounds like a dream come true for any blogger, no?  Well, it is, sort of.  But.  Consider the other side of the story, which is not so obvious from the first glance:

  • My hosting company handled the spike really well – no complaints or disconnections.  Not all hosting companies are created equal.
  • Commenting form on my blog was broken at the time of the spike.  It was down the whole spike duration.
  • There were about 500 comments posted in the reddit.com thread.
  • I’ve received almost 100 emails.
  • When commenting form got fixed, I got another dozen or so comments, plus another SPAM wave along with it.

If you imagine for a moment all that coming upon you in the middle of the working week, you’ll see a problem.  Who and how should respond to all that?

I’ve spent half a day today talking with my hoster about the commenting form.  Gladly it got fixed (the problem was session misconfiguration on the hosting company side).  Then I needed some time to respond to all those emails. In the meantime I quickly reviewed and approved all comments in the moderation queue.  That pretty much ate my day, together with some things I managed to slide in at work.

Later in the evening, when my family went to sleep, I actually read all the comments and responded to a few.  I also read through most comments at reddit.com .  Can I reply to any of those?  Nope.  That’s out of my resources.  I can’t handle all the traffic that came in.

Can you?  What will happen to your server if you’ll get digged or slashdotted?  How can you moderate all the comments?  How can you handle replies?  What about comments at other places – blogs, forums, and social networks that brought you in the traffic?  Do you have any moderators on standby?  Do you have any monitoring setup (Google Alerts, coComment, etc) for remote discussions about your content?

If you aren’t thinking about those things while promoting your blog, you are in for a big surprise…

Follow-up to “Where did all the PHP programmers go?”

This is a quick follow-up to yesterday’s post – “Where did all the PHP programmers go?“.

First of all, let me take the moment and say “Wow!”.  Somebody submitted the post to Reddit and it made it to the front page and got an unbelievable amount of comments.  Almost 500, and still coming.  Thank you all.

Secondly, the comments on this blog are fixed finally.  Murphy’s Law in action – they got broken just before the wave came in and they got fixed shortly after.

Thirdly, I should clear up a few things.  My apologies for getting you guys confused.  I never asked any candidate to compare sorting algorithms, much less to implement them.  I asked to sort an array.  I was expecting one of those PHP function calls in return.  But I only got it a few times.  Many candidates didn’t know how to sort an array (apparently they use MySQL to sort an array).  A few suggested “bubble sort”.  Probably thinking that the tasks for testing sorting algorithms.  One even went as far as implementing a bubble sort in PHP.  With pen and paper.  This one was the toughest to decide about, by the way.

Fourthly, the correction.  The language is indeed called Ruby, not Ruby on Rails. I am aware of that.  I was just trying to catch a thought.  Thanks for pointing it out though.

Fifthly, explanation for the pen and paper.  Yes, I know that programmers are used to typing code.  I know that they are used to their tools and online references.  But.  This is an interview.  My time is limited and I have to make a decision.  If I give all the tools and references to my mother, she will be able to solve the problem I am giving in reasonable time.  She is not a PHP developer.  She has no experience with PHP.  But she has enough of common sense to do it.  If I take everything away – she won’t be able to do that.  But any semi-decent programmer will do.  Further on, I am not feeding the resulting paper into the machine.  The only parser that sees that code is the one embedded in my brain.  And I assure you it is very tolerant to minor syntax errors and missing parameters.  I want to see the process.  The approach. Some data structures and algorithms.  A bit of style in variable names, indentation, and empty lines, if I am lucky.  That’s all.

Sixthly, on the exercise itself.  I like to think that I am pretty flexible with answers.  For this particular exercise, a Perl programmer inside me thinks associative array is the best data structre.  (And yes, before you start bashing further, I know that associative arrays in PHP aren’t the same as hashes in Perl.)  I can accept an OOP solution just fine.  What I find hard to accept is a single dimensional array with hopping over a pre-defined number of fields per record.

Seventhly, this post, once it got to reddit and then furthermore to other news streams, generated more candidates and hints to where to find them, then all of my prevoius efforts.  Thanks to all of you who sent me resumes, links, and pointers.  My inbox is a bit overwhelmed right now, but I’ll reply to everyone over the next few days.

Thanks a lot to all of you.