Blog of Leonid Mamchenkov

You just stepped in a pile of posts.

Entries Tagged ‘Programming’

Software engineering is like cooking

During the last few month I’ve been explaining software engineering to management types quite a bit.  Most of the “bosses” that I talked to weren’t technical at all, so I was trying to stay away from famous concepts, examples, and terminology as much as I could.  Of course, that required some sort of substitute for concepts, examples, and terminology.  I’ve tried analogies from different unrelated areas, and was surprised as how good cooking was fitting the purpose.

Before I go any further, I have to say that I am not a cook and that I don’t know much about cooking.  But.  I know just about the same as any other average human being.  Which, sort of, moves me into the same category with my targets, or “bosses”, as I called them before.

Here are a few examples that worked well.

[Read the rest of this entry...]

Programming religions

I’m slowly catching up with the news stream and all the jokes of the last few weeks.  “If programming languages were religions” is a nice one.  Here is PHP, which I spent the most time with now:

PHP would be Cafeteria Christianity – Fights with Java for the web market. It draws a few concepts from C and Java, but only those that it really likes. Maybe it’s not as coherent as other languages, but at least it leaves you with much more freedom and ostensibly keeps the core idea of the whole thing. Also, the whole concept of “goto hell” was abandoned.

And here is Perl, which is my favourite programming language so far:

Perl would be Voodoo – An incomprehensible series of arcane incantations that involve the blood of goats and permanently corrupt your soul. Often used when your boss requires you to do an urgent task at 21:00 on friday night.

Check the rest of them for fun and profit.

Perl vs. PHP : variable scoping

I’ve mentioned quite a few times that I am a big fan of Perl programming languge.  However, most of my programming time these days is spent in PHP.  The languages are often similar, with PHP having its roots in Perl, and Perl being such a influence in the world of programming languages.  This similarity is often very helpful.  However there are a few difference, some of which are obvious and others are not.

One such difference that I came up recently (in someone else’s code though), was about variable scoping.  Consider an example in Perl:

#!/usr/bin/perl -w
use strict;
my @values = qw(foo bar hello world);
foreach my $value (@values) {
    print "Inside loop value = $value\n";
}
print "Outside loop value = $value\n";

The above script will generate a compilation error due to undefined variable $value.  The one outside the loop.

A very similar code in PHP though:

#!/usr/bin/php
<?php
$values = array('foo','bar','hello','world');
foreach ($values as $value) {
    print "Inside loop value = $value\n";
}
print "Outside loop value = $value\n";
?>

Will output the following:

Inside loop value = foo
Inside loop value = bar
Inside loop value = hello
Inside loop value = world
Outside loop value = world

In Perl, variable $value is scoped inside the loop.  Once the execution is out of the loop, there is no such thing as $value anymore, hence the compilation error (due to the use of strict and warnings).  In PHP, $value is in global scope, so the last value “world” is carried further down the road.  In case you reuse variable names in different places of your program, counting on scope to be different, you might get some really interesting and totally unexpected results.  And they won’t be too easy to track down too.  Be warned.

On remote logging with syslog

We’ve been doing some interesting things at work, as always, with yet more people and Linux boxes.  And of the side effects of mixing people, Linux boxes, and several locations is this need for some sort of centralized logging.  Luckily we have either syslog-ng or rsyslog daemons installed on each machine, so the only two issues seemed to be reconfiguration of syslog services for remote logging and setup of some log reading/searching tool for everyone to enjoy.

As for log reading and searching, there seems to be no end of tools.  We picked php-syslog-ng, which has web interface, MySQL back-end, access control, and more.  There were a few minor issues during setup and configuration, but overall it seemed to be OK.  I also patched the source code a bit in a few places, just to make it work nicer with our setup and our needs  (both numerical and symbolic priorities, preference for include masks over excludes, and full functionality with disabled caching).  In case you are interested, here is a patch against php-syslog-ng 2.9.8f tarball.

Once everything was up and running and we started looking through logs from all our hosts in the same place, there was one thing that surprised me a lot.  Either I don’t understand the syslog facilities and priorites fully (and I don’t claim that I do), or there is just too many software authors who don’t care much.  Most of our logs are coming in at priority critical.  Even if there isn’t much critical about them.  Emergency is also used way too much.  And there is hardly anything at debug or info or notice levels.  (RT, SpamAssassin, and many other applications seem to be using critical as their default log level).  Luckily, that  almost always is trivial to fix using either the configuration files or applications’ source code directly.

On software testing

The software is checked very carefully in a bottom-up fashion. First, each new line of code is checked, then sections of code or modules with special functions are verified. The scope is increased step by step until the new changes are incorporated into a complete system and checked. This complete output is considered the final product, newly released. But completely independently there is an independent verification group, that takes an adversary attitude to the software development group, and tests and verifies the software as if it were a customer of the delivered product. There is additional verification in using the new programs in simulators, etc. A discovery of an error during verification testing is considered very serious, and its origin studied very carefully to avoid such mistakes in the future. Such unexpected errors have been found only about six times in all the programming and program changing (for new or altered payloads) that has been done. The principle that is followed is that all the verification is not an aspect of program safety, it is merely a test of that safety, in a non-catastrophic verification. Flight safety is to be judged solely on how well the programs do in the verification tests. A failure here generates considerable concern.

The above was written by R. P. Feynman, in Feynman’s Appendix to the Rogers Commission Report on the Space Shuttle Challenger Accident, 1986. More than 20 years ago. Much recommended reading.

Found via Richard Feynman, the Challenger Disaster, and Software Engineering.

Oracle and PHP – the deadly mix

WI’ve spent most of the last week getting into, around, and out of the issues related to interoperability of Oracle and PHP.  Before you start laughing, cursing, and blaming, Oracle wasn’t my choice of the database for this specific project.  It’s just the company already had it installed and working for the background, and there needed to be some integration with the front, which is of course MySQL and PHP based.

First thing I do, obviously, is visit PHP.net to check for the prefix of the functions that I need for Oracle.  Through out my experience with PHP, that’s about the only thing I need to know to start working with the new database.  Oh, and the PHP module installed to provide those functions. Oracle interface for PHP is called is called OCI8.  All you need to do now is install the oci8 module.

Here comes the first trouble.  oci8 is not provided as a pre-compiled package for Fedora Linux.  There is an alternative yum repository – Remi, which has oci8 RPMs, but first of all, the oci8 module is compiled against somewhat outdated Oracle headers (version 10.2.0.4 instead of the latest 11.1.0.1), and it also needs to replace your native PHP and MySQL packages.  I tried that, and it sort of worked, but I wasn’t happy.  So I got my Fedora packages back and decided that I need to compile oci8 myself.

In order to compile oci8, one needs to download Oracle InstantClient (basic package) and some header files (devel package).  These can be downloaded from the Oracle web site, for free, minus the time for the registration.  The little trick here is that during oci8 compilation process, the includes are searched from locations which do not include the one from Oracle RPM.  I did a simple symlink of the includes folder to where Oracle headers were, and compilation went on just fine.  (Hint: otherwise you’ll get a whole lot of Zend related messages and a fatal error).  Gladly, I only had to do this path correction on the Fedora 9 machine.  My production server with Red Hat Enterprise Linux 5 compiled oci8 without any problems all by itself.

Update: more detailed instructions on the actual installation can be found here and here.

Now that oci8 installed and configured, I spent some time figuring the correct way to specify the DSN.   Oracle uses some weirdly name file (tnsnames.ora) in some weird location, but luckily there is a way to go around it.  More so, I recommend that you remove tnsnames.ora file altogether, since it can add to your troubles.  For example, if you mix spaces and tabs as whitespaces in that file, you are screwed.  So, just get rid of it.  The way you specify DSN is directly in the PHP script, and you use the syntax like so:  “//hostname.or.ip:port/dbname“.  Intuitive, I know.

Once you’ll get connected to the server, you have a whole bag of surprises waiting for you.  That is if you are too used to working with MySQL.  First is the syntax.  Oracle is using PL/SQL, so you wipe the dust of from that really old Pascal textbook that you have somewhere.  “begin :result := some.procedure.call(:param1, :param2); end;” – that sort of thing.  Secondly, you’ll be happy to know that prepared queries are supported.  So your workflow will slightly change.  Perl programmers will feel more at home here.  oci_bind_by_name() and oci_execute() are your friends here.  Oh, and while you are at, get familiar with the types of the parameters, because they are important.  And don’t forget that you’ll have to bind each and every variable in the query, or get a fatal error. And since you are learning something here, get ready for the oracle errors.  The most frequent one you’ll get would be something like “Failed to retreive the error message for ORA-12345″, where 12345 would be a number of the error.  So you’ll google for ORA-12345 and ORA-54321 and ORA-XYZZZ a lot.  But than you’ll have a wrapper library and you’ll be OK.

Update: as was noted in the comments, PL/SQL is just an option, not a requirement.  Also, most of the headaches of the above paragraph could be avoided by using one of the PHP frameworks.  I personally haven’t yet tried the framework yet, since I’d like to see things working directly first.  Especially since we are not in the test mode only.

The bigger surprise is still waiting for you though.  You are very likely to discover that OCI8 implementation for PHP is very slow.  And I do mean extremely very slow.  I couldn’t believe that it could be slow, so I went into the source code and OMG!  It is really slow.  The slow part is around fetch_all() against fetch_row().  Basically, it’s always row by row and never all, even if you tell it how many rows you need fetched.

In my case, I have the server a bit far away, and there is a possibility to get many rows back.  So even for a simple query with 140 rows in results I was getting 20 seconds execution time.  Oracle was serving results fast, the network was OK, machines on both sides were powerful and all, but it was still taking 20 seconds or more.

I am still trying to find the solution to this issue, but so far it seems that the current way I do it will be the way to do it.  And the way I do it now is the following.  Never ever run direct SQL queries.  Everything goes through a stored procedure.  The results are returned all in a single row.  And that single row has the BLOB (CLOB actually) with all results in one single XML.  Fetching works good enough to get it, and then parsing is done with one of the billion XML parsers for PHP.

In my case MiniXML worked pretty good until bigger results started coming in.  That’s when I learned an important lesson.  MiniXML parses XML with a regular expression.  PHP has a couple of settings in the configuration file that limits the size of the memory and recursion during regex parsing – pcre.backtrack_limit and pcre.recursion_limit.  If you really want to kill your server, set these to -1 (instead of default 100000) and try a regex against a 1 MB XML file.  Enjoy, cause it won’t be long before everything goes down. I didn’t feel like changing from MiniXML so we just implemented some limits in the queries and stored procedures on the Oracle side, and add a few checks in PHP fail rather than crash the system.

So, to some it up, here is my experience with Oracle and PHP from the last week:

  • I had to register on Oracle web site to download packages
  • I had to re-learn my long forgotten compilation skills
  • I had to go read some C
  • I had to step on the “re-inventing the wheel” path more than once
  • I am parsing XML when working with the database
  • I had a head ache more than twice
  • I didn’t have much fun
  • After all, it works.  Sort of.

One last point in this saga is about Googling.  Ask me any question, and I do mean any question, about MySQL.  Heck, even PostgreSQL.  And the answer is just there, on the first page of Google results.  In any human or programming language.  For any operating system.  You’ll be sorted out and working in less then a minute.   Then, try asking even the simplest of the simplest questions about Oracle and PHP.  Sometimes you’ll find something.  Some other times, you won’t.  The overall feeling I have is that not a lot of people are using Oracle with PHP, and those of them who do are in their majority not very happy.

Now I’ve joined the army.

BLOB is bad for your (mental) health

If you ever mention that your web application uses database to store files, you risk being flamed into oblivion.  Indeed, in most cases, it is a bad idea, since file system is more effecient when it comes to files.  However, there are cases when it makes sense to have files saved in the database.

Maybe I am doing something wrong, but in the last six month, I had to develop at least three systems that used MySQL for file storage (uploaded files that have to be synchronized across several hosts, etc).  Yesterday, for the third time I stumbled across the same problem, that almost drove me insane.

MySQL has four data types for storing binary data – TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB.  Somehow I always forget about these and use BLOB.  BLOB works just fine, but it has a limit on size, which is rather low – 64 KBytes.  The mean thing here is that it will work just fine with most of the test data – text files, short PDFs, and small pictures.  Once the application is tested and put into production, the corrupted files will start coming in.  Re-writing all parts that deal with uploading, moving, cleaning, escaping, and encrypting binary data takes time.  Going through file reading and writing routines is boring too, and it won’t help either.

By the time, the issue is discovered and all fields are changed to LONGBLOB, it is often very late, and you’ve lost your weekend, as well as a lot of large files. This post is an attempt to save my (and your) sanity.

Reminder: use LONGBLOB instead of BLOB for file storage, unless you are absolutely sure about the maximum size of incoming data.

Follow-up to “Where did all the PHP programmers go?”

This is a quick follow-up to yesterday’s post – “Where did all the PHP programmers go?“.

First of all, let me take the moment and say “Wow!”.  Somebody submitted the post to Reddit and it made it to the front page and got an unbelievable amount of comments.  Almost 500, and still coming.  Thank you all.

Secondly, the comments on this blog are fixed finally.  Murphy’s Law in action – they got broken just before the wave came in and they got fixed shortly after.

Thirdly, I should clear up a few things.  My apologies for getting you guys confused.  I never asked any candidate to compare sorting algorithms, much less to implement them.  I asked to sort an array.  I was expecting one of those PHP function calls in return.  But I only got it a few times.  Many candidates didn’t know how to sort an array (apparently they use MySQL to sort an array).  A few suggested “bubble sort”.  Probably thinking that the tasks for testing sorting algorithms.  One even went as far as implementing a bubble sort in PHP.  With pen and paper.  This one was the toughest to decide about, by the way.

Fourthly, the correction.  The language is indeed called Ruby, not Ruby on Rails. I am aware of that.  I was just trying to catch a thought.  Thanks for pointing it out though.

Fifthly, explanation for the pen and paper.  Yes, I know that programmers are used to typing code.  I know that they are used to their tools and online references.  But.  This is an interview.  My time is limited and I have to make a decision.  If I give all the tools and references to my mother, she will be able to solve the problem I am giving in reasonable time.  She is not a PHP developer.  She has no experience with PHP.  But she has enough of common sense to do it.  If I take everything away – she won’t be able to do that.  But any semi-decent programmer will do.  Further on, I am not feeding the resulting paper into the machine.  The only parser that sees that code is the one embedded in my brain.  And I assure you it is very tolerant to minor syntax errors and missing parameters.  I want to see the process.  The approach. Some data structures and algorithms.  A bit of style in variable names, indentation, and empty lines, if I am lucky.  That’s all.

Sixthly, on the exercise itself.  I like to think that I am pretty flexible with answers.  For this particular exercise, a Perl programmer inside me thinks associative array is the best data structre.  (And yes, before you start bashing further, I know that associative arrays in PHP aren’t the same as hashes in Perl.)  I can accept an OOP solution just fine.  What I find hard to accept is a single dimensional array with hopping over a pre-defined number of fields per record.

Seventhly, this post, once it got to reddit and then furthermore to other news streams, generated more candidates and hints to where to find them, then all of my prevoius efforts.  Thanks to all of you who sent me resumes, links, and pointers.  My inbox is a bit overwhelmed right now, but I’ll reply to everyone over the next few days.

Thanks a lot to all of you.

More on PHP

I’ve recently posted some of my thoughts on PHP.  If you enjoy the topic, here is an excellent post over at Coding Horror, which reminds why PHP sucks so badly and also why it will stick around for some time to come.

Annoying software

Slashdot is running the post about annoying software.  The fact that Slashdot crowd mostly consists of computer geeks is sort of a guarantee for some interesting comments.

With my Fedora 9 saga I had to review and try a lot of new software.  Needless to say, I found quite a few annoying bits.  Here is a brief list, just to give you an idea:

  • Clock applet in Gnome. It shows calendar with Sunday being first day of the week.  If you don’t like it, you’ll have to recompile your locale to change it. This one is cancelled out though by an excellent support of Google Calendar (or, for that matter, any other web published calendar).
  • Metacity window manager in Gnome. Window titles are displayed in the middle.  This is really annoying for those of us who are used to seeing them on the left.  There is no option to change this setting either in GUI or in GConf.
  • Pidgin new message notification. I once had it popping up nice looking bubbles, but I don’t remember how I managed to do it.  I also don’t remember how I managed to break it.  And I have no idea to bring them back.  I really miss them though.
  • WordPress 2.5 post editing screen. It has been much reworked in the latest version and looks and feels so much better. However, the list of categories was moved from a really convenient location on the right of the screen to a really inconvenient location at the bottom of the screen.
  • FileZilla FTP manager. This one drives me nuts with server connections.  It either disconnects every 40 seconds when being idle.  Or it keeps multiple connections open forever and most FTP servers block me out temporary.
  • Request Tracker (RT3). Works perfectly with queues and tickets, but annoys the heck out of me when I need to do something with users.  Users aren’t first level citizens, like tickets.
  • SugarCRM. Excellent business tool, with lots of small annoyances, like not being able to set default user role, disable theme selector everywhere, change logos to company ones, lock down the functionality, etc.  Most of these are easily fixable.  But some aren’t as trivial as they may sound or seem.
  • Google Reader. This one annoys me a bit (but often) when I want to leave a few items in the feed unread and go deeper into archives.  Somehow it keeps marking everything I passed as read.

Now, what piece of software were you annoyed with recently?