On public transport in Cyprus

Often, when I talk to my friends abroad, I hear that we have it too good here, in Cyprus.  As one of the example, they say that everyone has a car.  And while I don’t disagree – the life in Cyprus is good indeed – I often find it hard to explain that a car here is more than just a convenience.  It’s a necessity.

I also understand why it is difficult to grasp the idea for those who’ve never been in Cyprus.  Many of them can’t imagine a city with no public transport at all.  Public transport is a norm pretty much everywhere you go.  But not in Cyprus.

Finally, I now have a link to send to those friends of mine, who find it difficult to believe me.  Cyprus Mail runs an article with some statistics.  These are Nicosia-based, but I don’t think Limassol or any other city on the island would be much different.

Nicosia also stood out with 84 per cent of respondents saying they never used public transport. Only a minority – four per cent – used public transport to commute in Nicosia with 91 per cent travelling by car or motorcycle. Just five per cent walked or cycled to work.

How accurate is Google Analytics?

That’s the question that I was asked recently by one of the co-workers.   It is simple and not so simple at the same time.  It really depends on what you are looking for, what is the acceptable accuracy, and what is that you are comparing Google Analytics with.

For example, if you compare the numbers from your Google Analytics reports to the summaries of the web server logs, you’ll probably find that Google Analytics reports lower numbers.  Almost like not everything is recorded.  Which is true because Google Analytics is using JavaScript to track your visitors.  Server logs record all hits to your web server, but the information in logs is very limited – it won’t be enough for anything but very basic tracking.

How much will numbers differ?  Here is what Google Analytics blog has to say:

Google Analytics uses JavaScript tags to collect data. This industry-standard method yields reliable trends and a high degree of precision, but it’s not perfect. Most of the time, if you are noticing data discrepancies greater than 10%, it’s due to an installation issue. Common problems include JavaScript errors, redirects, untagged pages and slow client-side load times.

Having used Google Analytics on a number of sites over a number of years, I’d say that that is just about right.

Web statistics and visitor tracking : things you need to know

First of all, just to make it clear, I don’t recommend writing your own web statistics / analytics / tracking application.  Google Analytics can track and report pretty much everything you will ever need. Period. If you think it can’t do it, chances are you just don’t know how.  That’s much easier to correct than to write your own tracking / reporting application.  I promise.  In case though, Google Analytics doesn’t do something that you need, grab one of those Open Source applications and modify it to suit.  While not as easy as learning Google Analytics, that would still be much easier than doing your own thing from scratch.

However, if you still decide to roll out your own tracker, here are a few things that you need to know.

  • Use the bicycle, don’t reinvent it. Most of the tracking applications that I’ve seen use some form of JavaScript, which is appended right before the end of the page markup.  Said JavaScript collects as much statistics as you need and generates a request to an image on the remote server (your tracking application), passing gathered statistics as parameters to the image.  On the server side, your tracking application gathers sent parameters, merges them with whatever else you can get from the server side, and saves in the database or in your data storage of choice.
  • Keep ad blocking applications in mind. Many ad blocking plugins for different browsers block 1×1 pixel images from remote servers.  Be a bit more creative – use a 2×1 or a 1×2 pixel image.  If it is a transparent GIF at the bottom of the page, nobody will notice it anyway.
  • Gather as much as you can from the server side. It’s simpler, and you minimize the chances of breaking things with an URL which is too long (your GET request for the image with all parameters can run pretty long, especially if you pass current page and referring page URLs).
  • Minimize the length of your parameter names and values when you pass them to image GET request. Again, this is to avoid extremely long URLs.  You can sacrifice readability in your JavaScript and instead document parameters in the server side tracker application.
  • Record both client’s IP address and possible proxy server’s IP address. That is available for you in the request headers ($_SERVER[‘HTTP_X_FORWARDED_FOR’] in PHP for example).  Once you got the IP addresses, use GeoIP to lookup the country, region, city, coordinates, etc.  It’s better to do so at the time you record the data.  There is a free GeoIP service as well, but it will give you much less information.  The commercial one is not that expensive.
  • Record client’s browser information. Browsercap is very useful for that.  However, it’s better to parse user agent string with browsercap at the report / export time, not at the request recording time.  This will guarantee that you always have the most correct information about the browser in your report.  Browsercap gets updated with new signatures pretty often.
  • If you are tracking a secure site (HTTPS), chances are you won’t have referrer information available to you.  Apparently, that’s a security feature.
  • If you use both JavaScript and PHP to figure out the referrer, keep in mind that JavaScript uses document.referrer, while PHP uses $_SERVER[‘HTTP_REFERER’].  Notice that one is spelled with two Rs, while the other – with one.  That might save you some troubleshooting time.
  • It’s better to use the same JavaScript code snippet across all your sites.  To avoid SSL-related security warnings, your JavaScript need to figure out if it’s in HTTPS web site or in plain HTTP one. See Google Analytics example on how to actually do that.   It doesn’t hurt to have a signed SSL certificate for the HTTPS hosting of your tracker application.
  • Don’t forget about HTML and URL escaping / encoding. Check that everything works properly for you in different browsers.  JavaScript is still hard to nail right sometimes.
  • Keep the version of tracker application in every request log entry. This will much simplify your migrations later.  One of the ways to keep this automated is to use tags / keyword substitutions in your version control software (here is how to do this in Subversion).
  • Make sure your tracker spits out that transparent image no matter what. Broken image icons are very visible and you don’t want those on your site just because your tracker database went down temporarily.
  • For the best cross-site tracking, start tracker session, which will remain the same when visitor will go from one of your tracked web sites to another.  If your tracked web sites use sessions, pass their IDs to tracker, so that both tracked and tracker session IDs could be logged in the same request. This will help you link stats from several sites together, as well as do all sorts of drill-downs into site-specific stats straight from the bird-view reports.
  • Don’t be evil! There is a lot that you can collect about your visitors.  Make sure that you tell them exactly what you are collecting and how you are using it.  Aggregate and anonymize your logs to prevent negative consequences.  I’m sure you know what I mean.

Once again, think really good before you decide to do one yourself.  It’s not an easy job.  And even if you grab all the data you want and save it in your database, there is an incomparably bigger issue to solve yet – reports, graphs, export, and overall visualization and analytics part of that data.  Why would you even want to go into that?

Internet users in Cyprus

Blogoscoped reports that Google has expanded their Public Data Onebox functionality and now you can see the Internet penetration rate for population of any country.  All you need to do is search for “internet users in cyprus” (or use your favourite country).  Currently, it reports Cyprus having 38% of the population connected to the Internet.  And that sounds just about right.

Internet in Cyprus

The country with the highest Internet penetration rate that I know of is Netherlands.  Google reports it having 86.8% of population connected.  Which also sounds just about right.

Statistics and perceptions

While catching up with recent Cyprus Mail articles, I came across the one about involvement of foreigners in serious crimes in Cyprus.  Quote:

FOREIGNERS are involved in 40 per cent of all serious crimes, and 30 per cent of road deaths in Cyprus, the House Human Rights Committee head yesterday.

Being both a foreigner and a local (after almost 14 years here), I know how a lot of Cypriots are concerned with crimes being related to foreigners.  A quote like the above would be music to their ears.  However, on the other hand, that quote could be easily turned inside out.  For example, like so:

CYPRIOTS are involved in 60 per cent (or a majority) of all serious crimes, and 70 per cent (an absolute majority) of road deaths in Cyprus […]

This now would be music to the ears of many foreigners who think that Cypriots are too crooked with all cabaret, real estate, and gambling activity going on, and who are also extremely incosiderate and undereducated while driving on public roads.

Now, which one sounds worse?