CakePHP : Building factories with models and behaviors

CakePHP is a wonderful framework.   Recently I proved it to myself once again (not that I need much of that proof anyway).  The problem that we had at work was a whole lot of code in once place and no obvious way of how to break that code into smaller, more manageable pieces.  MVC is a fine architecture, but it wasn’t obvious to me how to apply it to larger projects.

In our particular case, we happen to have several data types, which are very similar to each other, yet should be treated differently.  Two examples are:

  1. Client account registrations.   Our application supports different types of accounts and each type has its own processing, forms, validations, etc.  However, each type of account is still an account registration.
  2. Financial transactions.  Our clients can send us money via a number of payment methods – credit cards, PayPal, bank wires, etc.  Each type of the transaction has its own processing, forms, validations, etc.  However, each type of the transaction is still a financial transaction of money coming in.

Having a separate model for each type of account or for each type of transaction seems excessive.  There are differences between each type, but not enough to justify a separate model.  Having a single model though means that it’ll continue to grow with each and every difference that needs to be coded in.  Something like a class factory design pattern would solve the problem nicely, but the question is how to fit it into an existing MVC architecture.  Read the rest of this post for a demonstration.

Continue reading CakePHP : Building factories with models and behaviors

PHP variables, strings, and curly braces

For the last couple of days we had a number arguments at work about what is the best way to surround a complex PHP variable inside a double-quoted string.  More specifically, should the sigil ($, dollar sign) be on the inside of the braces or on the outside.  Consider an example:

# my way
echo "Result: ${blah['something']}\n";
# the highway
echo "Result: {$blah['something']}\n";

While considering a number of examples, there seems to be no difference – both ways work.  We’d still need to pick one for consistency reasons though.  And I, as an ex-Perl programmer, was suggesting that we should use the dollar sign on the outside of the expression.  This how I remember it being in Perl (and PHP originated from Perl) .  This is how I am used to it.  And this is how makes most sense to me – a dollar sign immediately warns the programmer that the variable is ahead.

However, after consulting PHP documentation, I was proved wrong.  It is said that both ways often work, but it is much safer to use the dollar sign on the inside.  The manual page even provides a few examples where the dollar on the outside won’t work (such in case with objects).

While this is just a small thing to know and get used to, it still looks annoying to me.

PHP date() and 53 weeks

Let’s say you have a bunch of statistical data.  And all that data is date-related.  And let’s say want to display that data on a chart, a weekly average or something along those lines.  One of the ways for you to place the value into the proper week would be something like this:

$week = date('W', strtotime($stats_date));
$values[$week][] = $stats_value;

And if you did it this way, sooner or later, you’d notice that something is not quite working right at the edges of your chart.  With code as simple and straight-forward as this, you’d probably look for the problem elsewhere.  Maybe it’s your statistical data which is wrong, or the graph is not generated properly.  But the problem is here.

How many weeks do you think there are in a year?  A common knowledge says 52.  However, if you think for a moment about how the weeks are related to the year, you’ll realize that the first and last weeks don’t necessary start and end at the edge of the year.  If you play around with 1st of January and 31st of December across several years, you’ll notice that sometimes they fall into the 53rd week.  (As do a few more days, not just these two).

And here, the problem with the “W” date() format starts to emerge.   When scattering your data across a single year, you’d most often expect January to start with the first week of the year.  But it doesn’t. Sometimes the start of it falls into the 53rd week of the previous year.  And date(‘W’, $your_time) will happily return 53.  What will this do to your chart?  Two things are most likely.  The first week’s values would get reduced, and the last week’s values would get increased.  Or those values of the first week would altogether vanish from the graph.  That is unless you are careful.  Which I hope you’ll now be.

See comments to PHP date() manual for several solutions of this problem.

Web statistics and visitor tracking : things you need to know

First of all, just to make it clear, I don’t recommend writing your own web statistics / analytics / tracking application.  Google Analytics can track and report pretty much everything you will ever need. Period. If you think it can’t do it, chances are you just don’t know how.  That’s much easier to correct than to write your own tracking / reporting application.  I promise.  In case though, Google Analytics doesn’t do something that you need, grab one of those Open Source applications and modify it to suit.  While not as easy as learning Google Analytics, that would still be much easier than doing your own thing from scratch.

However, if you still decide to roll out your own tracker, here are a few things that you need to know.

  • Use the bicycle, don’t reinvent it. Most of the tracking applications that I’ve seen use some form of JavaScript, which is appended right before the end of the page markup.  Said JavaScript collects as much statistics as you need and generates a request to an image on the remote server (your tracking application), passing gathered statistics as parameters to the image.  On the server side, your tracking application gathers sent parameters, merges them with whatever else you can get from the server side, and saves in the database or in your data storage of choice.
  • Keep ad blocking applications in mind. Many ad blocking plugins for different browsers block 1×1 pixel images from remote servers.  Be a bit more creative – use a 2×1 or a 1×2 pixel image.  If it is a transparent GIF at the bottom of the page, nobody will notice it anyway.
  • Gather as much as you can from the server side. It’s simpler, and you minimize the chances of breaking things with an URL which is too long (your GET request for the image with all parameters can run pretty long, especially if you pass current page and referring page URLs).
  • Minimize the length of your parameter names and values when you pass them to image GET request. Again, this is to avoid extremely long URLs.  You can sacrifice readability in your JavaScript and instead document parameters in the server side tracker application.
  • Record both client’s IP address and possible proxy server’s IP address. That is available for you in the request headers ($_SERVER[‘HTTP_X_FORWARDED_FOR’] in PHP for example).  Once you got the IP addresses, use GeoIP to lookup the country, region, city, coordinates, etc.  It’s better to do so at the time you record the data.  There is a free GeoIP service as well, but it will give you much less information.  The commercial one is not that expensive.
  • Record client’s browser information. Browsercap is very useful for that.  However, it’s better to parse user agent string with browsercap at the report / export time, not at the request recording time.  This will guarantee that you always have the most correct information about the browser in your report.  Browsercap gets updated with new signatures pretty often.
  • If you are tracking a secure site (HTTPS), chances are you won’t have referrer information available to you.  Apparently, that’s a security feature.
  • If you use both JavaScript and PHP to figure out the referrer, keep in mind that JavaScript uses document.referrer, while PHP uses $_SERVER[‘HTTP_REFERER’].  Notice that one is spelled with two Rs, while the other – with one.  That might save you some troubleshooting time.
  • It’s better to use the same JavaScript code snippet across all your sites.  To avoid SSL-related security warnings, your JavaScript need to figure out if it’s in HTTPS web site or in plain HTTP one. See Google Analytics example on how to actually do that.   It doesn’t hurt to have a signed SSL certificate for the HTTPS hosting of your tracker application.
  • Don’t forget about HTML and URL escaping / encoding. Check that everything works properly for you in different browsers.  JavaScript is still hard to nail right sometimes.
  • Keep the version of tracker application in every request log entry. This will much simplify your migrations later.  One of the ways to keep this automated is to use tags / keyword substitutions in your version control software (here is how to do this in Subversion).
  • Make sure your tracker spits out that transparent image no matter what. Broken image icons are very visible and you don’t want those on your site just because your tracker database went down temporarily.
  • For the best cross-site tracking, start tracker session, which will remain the same when visitor will go from one of your tracked web sites to another.  If your tracked web sites use sessions, pass their IDs to tracker, so that both tracked and tracker session IDs could be logged in the same request. This will help you link stats from several sites together, as well as do all sorts of drill-downs into site-specific stats straight from the bird-view reports.
  • Don’t be evil! There is a lot that you can collect about your visitors.  Make sure that you tell them exactly what you are collecting and how you are using it.  Aggregate and anonymize your logs to prevent negative consequences.  I’m sure you know what I mean.

Once again, think really good before you decide to do one yourself.  It’s not an easy job.  And even if you grab all the data you want and save it in your database, there is an incomparably bigger issue to solve yet – reports, graphs, export, and overall visualization and analytics part of that data.  Why would you even want to go into that?

Enforcing coding styles in PHP

I came across a plugin for CakePHP which helps to check if the certain code follows CakePHP coding style.  While I haven’t tried it, I think the better way is to utilize CodeSniffer.  As per PHP_CodeSniffer PEAR page:

PHP_CodeSniffer tokenises PHP, JavaScript and CSS files and detects violations of a defined set of coding standards.

Which basically means that PHP_CodeSniffer is a generic tool for validating your code.  You can use for CakePHP, WordPress, or any other PHP project that you are working on.  The best part is that you can create your own set of rules regarding coding style and then make sure that your team follows it. If you don’t care that much for your own rules, then you can use one of the many existing rulesets.  Some of these come together with CodeSniffer package, others are available on the Web.

Setting up CodeSniffer for my team at work has been a long lasting TODO item, however it looks like I will be able to start working on this next week.  Once it created, tested, and everyone is happy with it, we’ll have it in the pre-commit hook in our Subversion repository.  This way, we will prevent commits of any code that does not follow our rules.  Of course, I plan to only run CodeSniffer against the code that we wrote in-house.  There is no need to re-format all the third-party code just for the sake of it.  Plus, we are rarely doing any modifications of the third-party code at all.