I've written previously about the nightmares of using Google Analytics to gather insights on how visitors are interacting with your websites. Anyone who has run a website recently and has paid attention to referrals will have seen this bogus data filling your stats screen. This used to be an issue that affected all statistics gathering platforms equally as a result of crawlers (like the one operated by Semalt) loading websites including the java script that handles that analytics tracking. Most web crawlers, like the one operated by Google's search engine, don't load and execute java scripts as they don't need to index the user interactive functions provided by java. These crawlers intentionally load the scripts and provide forged data to them so that their referral data shows up in your analytics data, and they hope you share your referral information publicly so they get even more links back.
There are many methods of blocking or hiding their bogus hits to your analytics data, although recently they have seemed to find a way to circumvent the protections I have in place to block them. After some investigations I've found that many of the bogus referral hits I was finding in my Google Analytics account were NEVER actually making a request from my webserver. It appears that they were actually making calls directly to the Google Analytics tracking script, completely bypassing my webserver and my referral spam protection measures. This practice is commonly referred to as "Ghost Referral Traffic". There are many discussions out there about removing these bogus entries from your analytics data. You will likely find that most of the invalid traffic in your analytics account now has either no hostname listed, or the hostname of sites you don't host. As you can see in the example to the left (which is a listing from Google Analytics of referral traffic), all the bogus referral entries show hostname as "not set", where the only legit entry showing is the 4 sessions coming from blocksemalt.org to bentownsend.com (which would be clicks to a previous article I wrote about dealing with referral spam). In other analytics accounts I've seen bogus hostnames listed in addition to "not set". I've had some limited success in setting up filters in Google Analytics to filter out these "ghost referrals", the only problem is it seems that every time I sign into my analytics account a new bogus entry or two has shown up and I need to modify the filters. It seems I spend more time filtering bogus data out of my analytics data than I do actually looking at the data and building goals.
With all these headaches with bogus data added to some privacy concerns regarding Google Analytics, I began wondering if there was an open source alternative to Google Analytics that I could host myself and reduce the headaches that have come along with using Google Analytics. I did some research and found 2 good possibilities, that cover the major features I'd be interested in, PHP/MySQL based, and open source.
The first I looked at is Open Web Analytics (OWA), which appears to be Licensed under GPL v2.0, but that wasn't very well advertised. Development of OWA is handled on GitHub, and the issue queue is fairly active with responses from multiple users including Peter Adams, the original author. It has all the features I was looking for; Entry and Exit Pages, referral tracking, goals, Multi-site support, tracking page views, and view length to name a few. There were also a few happy surprise features, like Click-streams, Click Tracking and Heat Maps, all of which worked but with limited successes.
The second package I looked at is called Piwik, at first glance it seems pretty much on par with the features listed for OWA, a few that stood out to me were site/page speed, user interactions/content tracking, email reports, and customizable dashboard. Another big plus for me is the native android application. Like OWA, Piwik is licensed under GNU General Public License v3. A few things I saw well looking over the documentation that I liked were; recent and active development (as I write this the most recent activity was an hour ago), very good privacy considerations, and a healthy dose of 3rd party plugins.
Both of these seem to cover all the basics that I'd expect from an analytics package and both seem to be a good rival to what I'm getting from Google Analytics currently. I've decided to install both packages on one of my servers and run a side by side (by side) comparison of all three; Google Analytics, Open Web Analytics, and Piwik. I'll start a new article going into depth on both of these packages' installation and setup process and a third covering the test results for one of my client sites.