Web statistics and visitor tracking : things you need to know

First of all, just to make it clear, I don’t recommend writing your own web statistics / analytics / tracking application.  Google Analytics can track and report pretty much everything you will ever need. Period. If you think it can’t do it, chances are you just don’t know how.  That’s much easier to correct than to write your own tracking / reporting application.  I promise.  In case though, Google Analytics doesn’t do something that you need, grab one of those Open Source applications and modify it to suit.  While not as easy as learning Google Analytics, that would still be much easier than doing your own thing from scratch.

However, if you still decide to roll out your own tracker, here are a few things that you need to know.

  • Use the bicycle, don’t reinvent it. Most of the tracking applications that I’ve seen use some form of JavaScript, which is appended right before the end of the page markup.  Said JavaScript collects as much statistics as you need and generates a request to an image on the remote server (your tracking application), passing gathered statistics as parameters to the image.  On the server side, your tracking application gathers sent parameters, merges them with whatever else you can get from the server side, and saves in the database or in your data storage of choice.
  • Keep ad blocking applications in mind. Many ad blocking plugins for different browsers block 1×1 pixel images from remote servers.  Be a bit more creative – use a 2×1 or a 1×2 pixel image.  If it is a transparent GIF at the bottom of the page, nobody will notice it anyway.
  • Gather as much as you can from the server side. It’s simpler, and you minimize the chances of breaking things with an URL which is too long (your GET request for the image with all parameters can run pretty long, especially if you pass current page and referring page URLs).
  • Minimize the length of your parameter names and values when you pass them to image GET request. Again, this is to avoid extremely long URLs.  You can sacrifice readability in your JavaScript and instead document parameters in the server side tracker application.
  • Record both client’s IP address and possible proxy server’s IP address. That is available for you in the request headers ($_SERVER[‘HTTP_X_FORWARDED_FOR’] in PHP for example).  Once you got the IP addresses, use GeoIP to lookup the country, region, city, coordinates, etc.  It’s better to do so at the time you record the data.  There is a free GeoIP service as well, but it will give you much less information.  The commercial one is not that expensive.
  • Record client’s browser information. Browsercap is very useful for that.  However, it’s better to parse user agent string with browsercap at the report / export time, not at the request recording time.  This will guarantee that you always have the most correct information about the browser in your report.  Browsercap gets updated with new signatures pretty often.
  • If you are tracking a secure site (HTTPS), chances are you won’t have referrer information available to you.  Apparently, that’s a security feature.
  • If you use both JavaScript and PHP to figure out the referrer, keep in mind that JavaScript uses document.referrer, while PHP uses $_SERVER[‘HTTP_REFERER’].  Notice that one is spelled with two Rs, while the other – with one.  That might save you some troubleshooting time.
  • It’s better to use the same JavaScript code snippet across all your sites.  To avoid SSL-related security warnings, your JavaScript need to figure out if it’s in HTTPS web site or in plain HTTP one. See Google Analytics example on how to actually do that.   It doesn’t hurt to have a signed SSL certificate for the HTTPS hosting of your tracker application.
  • Don’t forget about HTML and URL escaping / encoding. Check that everything works properly for you in different browsers.  JavaScript is still hard to nail right sometimes.
  • Keep the version of tracker application in every request log entry. This will much simplify your migrations later.  One of the ways to keep this automated is to use tags / keyword substitutions in your version control software (here is how to do this in Subversion).
  • Make sure your tracker spits out that transparent image no matter what. Broken image icons are very visible and you don’t want those on your site just because your tracker database went down temporarily.
  • For the best cross-site tracking, start tracker session, which will remain the same when visitor will go from one of your tracked web sites to another.  If your tracked web sites use sessions, pass their IDs to tracker, so that both tracked and tracker session IDs could be logged in the same request. This will help you link stats from several sites together, as well as do all sorts of drill-downs into site-specific stats straight from the bird-view reports.
  • Don’t be evil! There is a lot that you can collect about your visitors.  Make sure that you tell them exactly what you are collecting and how you are using it.  Aggregate and anonymize your logs to prevent negative consequences.  I’m sure you know what I mean.

Once again, think really good before you decide to do one yourself.  It’s not an easy job.  And even if you grab all the data you want and save it in your database, there is an incomparably bigger issue to solve yet – reports, graphs, export, and overall visualization and analytics part of that data.  Why would you even want to go into that?

Build for the mobile

I just had a revelation. An enlightenment, if you will.  You know how it happens – you think about a solution to a problem for a really long time.  Then you don’t think about it anymore. At least not consciously.  But your brain is still crunching.  You can feel it.  But if the solution is still not found, then get used to that constant crunching and ignore it.  And then you even forget it. And then, some time after, there is a Big Bang.  A huge flash in your head.  And it’s not the solution to the problem yet.  But it’s a sign and a reminder that your brain is still working on something you have long forgotten you had to solve.  That’s what I just had.

Being involved with a lot of web development, I was trying to figure out how to go about all those mobile devices.  Mobile Internet user base is growing fast and even today it is so big that it can’t be ignored anymore.  Gladly, most mobile devices run full blown web browsers with CSS and JavaScript support.  Some can even do Flash.  So it’s not like web development for mobile devices is something completely different from web development for desktops.

And yet, there are differences.  For the near future, these are the differences that I can think about:

  • Mobile devices have smaller screens and that’s not going anywhere.  Even if supported resolutions get higher and higher, the physical size of the screen won’t match the desktop screen any time soon.
  • Mobile devices have handicapped input.  Flip-out QWERTY keyboards are quite usable now and handwriting recognition is getting better by the day.  But mobile device is not and probably will not be as convenient for input as desktop computers.
  • Mobile devices have less processing power.  They get more power, but while they do so, desktop clients do as well.  And so the difference is maintained.  With more and more functionality being pushed out into client side, processing power is an important issue.
  • Mobile devices have unstable connectivity and higher bandwidth costs.  Again, with all 3G networks expanding globally and more and more free WiFi hot-spots installed everywhere, the connectivity problem is getting partially solved.  But it’s not going to be solved completely any time soon (coverage, higher costs, battery life are just some of the reasons).

While there are probably other things you can put on that list above, even the ones I have there are enough to consider a different approaches when developing for mobiles.  And why should we consider them at all?  Well, here is an image that actually triggered that big flash in my mind that I spoke about earlier (shamelessly borrowed from Paul Kedrosky blog post).

Mobile Internet Graph

You (of course I mean “I”, “we”, “they”, and “you”) cannot ignore mobile devices anymore when building web sites and applications.  So, how should this problem be approached?  And now for that revelation, enlightenment that I mentioned earlier in the post:

Build for the mobile device first, then extend for the rest.

That’s not a new approach.  It’s something that has been used and recommend before.  It was just phrased differently.  It was along the lines of : limit resources in your development environment and you’ll get a much more efficient and resource aware application.  If a developer has only 512 MB of RAM on the machine he uses to write and test his code, chances of that application being much more effecient on a 4 GB server are higher than of application written on a 4 GB machine. ([*] citation needed)

If you build your web site or application for the mobile device, you’ll ensure most of these:

  • It works well with small screen sizes and lower resolutions.
  • It requires the minimum of input from the user.
  • It has exactly the right balance between client-side and server-side processing.
  • It supports a whole lot of browsers, even most of those browsers don’t exist on the desktop.
  • It has at least some optimization in terms of download size, client-side caching, etc.

And when your web project works on the mobile devices, it will be much easier for you to check for extra resources in the client’s browser (higher resolution, better browser, etc) and enhance behavior with more bells and whistles.  You’d probably won’t want to do this yourself anyway.

I think adding additional bells and whistles would be much simpler and faster, then removing and reorganizing things in the application that has been built for the desktop browser and now needs to support, or at least behave nicely with mobile browsers.

I would be very surprised if you actually read the post all the way down to here.  And just to thank you, I thought I should surprise you.  Most of the above post just came out from the top of my head, has no research, measurements, or supportive data.  It’s not even something I have discussed with someone else yet.  So, I suggest, you take it with the jar of salt, jar of pepper, and a pint-sized bottle of red hot chili sauce.

Enforcing coding styles in PHP

I came across a plugin for CakePHP which helps to check if the certain code follows CakePHP coding style.  While I haven’t tried it, I think the better way is to utilize CodeSniffer.  As per PHP_CodeSniffer PEAR page:

PHP_CodeSniffer tokenises PHP, JavaScript and CSS files and detects violations of a defined set of coding standards.

Which basically means that PHP_CodeSniffer is a generic tool for validating your code.  You can use for CakePHP, WordPress, or any other PHP project that you are working on.  The best part is that you can create your own set of rules regarding coding style and then make sure that your team follows it. If you don’t care that much for your own rules, then you can use one of the many existing rulesets.  Some of these come together with CodeSniffer package, others are available on the Web.

Setting up CodeSniffer for my team at work has been a long lasting TODO item, however it looks like I will be able to start working on this next week.  Once it created, tested, and everyone is happy with it, we’ll have it in the pre-commit hook in our Subversion repository.  This way, we will prevent commits of any code that does not follow our rules.  Of course, I plan to only run CodeSniffer against the code that we wrote in-house.  There is no need to re-format all the third-party code just for the sake of it.  Plus, we are rarely doing any modifications of the third-party code at all.

Attending PHP UK Conference 2009

Security centered design

The conference day.  We woke up early to get in queue at registration which opened at 08:30.  When we got to the Olympia Conference Center, which was about 5 minutes walk from our hotel, it was full of people.   More than a hundred people already, and we were early.  Got our badges and notepads, grabbed a coffee, and started wondering around.  There were a few sponsor stands, so we had something to do.

Honestly, I thought there would be more stands, and from companies which are closer related to web development.  We got to O’Reilly to buy some books at 35% discount (I was the first customer of the day, beta-testing the receipt issuing procedure, hehe).  Looked at iBuildings stand briefly.  Looked at Sun MySQL something to do with reporting tool something.  It was crowded over there and I had a cup of coffee in my hands, so didn’t get too close.  Saw a few people playing with Wii and some more with MS Xbox 360.  Seemed like fun.

The conference itself featured a few talks, and it was a double track, so each attndee had to chose from one of the two concurrent speeches which to attend.  Here are the ones that I went to:

  • Keynote talk: The future’s so bright, I gotta wear shades by Aral Balkan. It was a bit too lengthy for the points it made, but inspiration non-the-less.
  • Sharding Architectures by David Soria Parra.  Very interesting discussion on scaling database across several servers. Sharding technique described can be applied to much more than just that.
  • Of Lambda Functions, Closures and Traits by Sebastian Bergmann.  A look into some advanced features of PHP 5.3.  These will make writing PHP code a bit more fun, and result a bit more pleasant to look at.
  • Living with Frameworks by Stuart Herbert.  Nice, balanced look at why frameworks are important.  It was a bit misplaced though, since it was more for people who don’t yet use frameworks, while most of the audience was from the frameworks camp.
  • Myphp-busters: symfony framework by Stefan Koopmanschap.  An overview of Symfony framework, which made me love CakePHP even more.
  • Security-Centered Design — exploring the impact of human behavior by Chris Shiflett. Interesting descussion (with cool examples) of social part in security approaches.

Sharding Architectures and Lambda Functions were two of my favourite talks for technical insight.  Security-Centered Design and Living with Frameworks were the two favourites for non-technical inspiration.

After the last talk there were a few free beers at the venue, and after that there was another beer session at Brook Green Hotel.  Quite a few people, quite a few pints, quite a few interesting conversations and contacts made, excellent buffett, and overall a time well spent.

A note to conference organizers: I know you guys worked hard to make this happen, and that you are a bunch of hobbyiests who are not getting paid to do this, so, first of all, thank you.  I really enjoyed the event.  Here are a few things that I think could be improved, just in case  you will have control over them the next time:

  • WiFi coverage.  Yes, it was there and it was sort of working, but it was also slow and unstable.  At the beginning I thought that was just me for some reason, but then heard a few more people complain.
  • Power sockets.  I remember seeing only 3.   Maybe I just didn’t find them, of course, but they are sort of important.
  • Beer is the ultimate conversation maker.  Have it nearby from lunch on and more magic would happen.  (It doesn’t have to be free)
  • Mechandize.  Stickers, t-shirts, badges, etc to help remember and promote the event.
  • More stands.  I wanted to see people who do hosting, consulting, trainging, build tools, and more of the related.

As I said, I had an excellent time, learned a few new things, got inspired, met interesting people, etc.  An event was definitely a success and I’d gladly attend the future ones as well.  Oh, and I made a few pictures, which are available in my PHP UK Conference 2009 Flickr set.

Software engineering is like cooking

During the last few month I’ve been explaining software engineering to management types quite a bit.  Most of the “bosses” that I talked to weren’t technical at all, so I was trying to stay away from famous concepts, examples, and terminology as much as I could.  Of course, that required some sort of substitute for concepts, examples, and terminology.  I’ve tried analogies from different unrelated areas, and was surprised as how good cooking was fitting the purpose.

Before I go any further, I have to say that I am not a cook and that I don’t know much about cooking.  But.  I know just about the same as any other average human being.  Which, sort of, moves me into the same category with my targets, or “bosses”, as I called them before.

Here are a few examples that worked well.

Continue reading Software engineering is like cooking