Google Analytics is a popular free tool that webmasters use to find out about visitors to their site and what these visitors do. It is a part of overall Google research structure and has access to a lot of data that Google collects. It will tell you the age and the sex of your visitors, for example, as well as a lot about their interests; this is information that Google apparently extracts from their behavior on their computers, not related to your site at all.
On the other hand, Google gives you only general information withholding the exact IPs of your visitors and the exact times of the visits, to preserve your visitors’ privacy. (If it gave you the exact times of the visits, you could have compared the Google data with your server log and identify your visitors with their private data that Google gives you.)
There are many competing tools that analyze website traffic although they do not have access to the in-depth information Google has. I tried one of such tool, Piwik, and compared the results. As opposed to Google Analytics, Piwik collects the visitors’ IPs and the exact visit times although Piwik can drop the tail of the IP if configured. (On my site I let Piwik collect the visitors’ IPs but do not transfer them to anyone.)
For comparison, I choose the period April 1-15, 2015.
Over this time Google reported 175 sessions by 93 users with the total of 226 page views. The visitors came from 27 countries. Most of the traffic came in the first three days of the period. You can see the daily graph here.
Piwik for the same period reported only 30 visits with 35 page views of which 31 were unique page views. You can see the daily graph here.
The difference is more than sixfold in page views and about threefold in visitors.
The daily graphs have similar shape: they agree that on certain days there were no visitors to my site at all, but when there were, Google consistently reports much more. (It is not true, though, that whenever Google shows more, Piwik shows more: Google showed most sessions, 66, on April 2, followed by 20 sessions on April 1, while Piwik showed 3 visits on April 2 which is less than 6 visits reported on April 1.)
Intrigued by such a severe mismatch, I compared these data with my server logs.
I took the days for which the difference was the most pronounced, April 1-3.
For these days Google reported 97 sessions, 64 users from 22 countries, and 90 pageviews, of which 20 sessions (22 pageviews) were on April 1, 66 sessions (68 pageviews) on April 2 and 11 sessions (12 pageviews) on April 3.
For the same days Piwik reported 12 visits of which 6 visits (8 pages) were on April 1, 3 visits (3 pages) on April 2 and 3 visits (3 pages) on April 3.
I reviewed my server logs from March 31, 11:07 PM GMT-5 which corresponds to my local time April 1, 7:07 AM GMT+3, to April 4, 2:54 AM GMT-5 which corresponds to my local time April 4, 10:54 AM GMT+3. This period definitely covered April 2-3 even if there were some time zone mismatch in the Google reports (while it shouldn’t as I assume that Google reports in my local time as recorded in my Google profile).
Over this period, I have found genuine 20 page views in my server log. This excludes all visits by robots that download only the html page but not any css and js files.
Conclusion for Google Analytics: you decide
As the log period includes April 2-3, Google reported at least 80 pages for it, 68 on April 2 and 12 on April 3, and probably more on April 1. This number is four times more than the actual logged number of viewed pages, 20.
Update April 29, 2015: hulotte on Piwik forum has pointed out to me that the mismatch between Google Analytics and server logs may be caused by pages cached by CDN or internet providers. However, this appears not to be the case here because (1) I am not using CDN, (2) with traffic being a few visits a day, it is unlikely that 50 visitors over two days got their pages from provider cache, and (3) even if this had happened, at least some of this traffic would have been reported by Piwik too.
Comparison with Piwik
Here I was able to fully understand all the differences between the Piwik report and the server logs.
Over the log period, Piwik reported 10 pages. Of these pages one was misreported (i.e., there was no actual page view that correspond to this listing; what happened, was that user’s IP changed as the page was viewed, and Piwik reported this page twice) and 11 pageviews were missed by Piwik. This makes the full number of visited pages 20.
As of the 11 missed pages, 7 of them were missed because users apparently closed the pages before Piwik script could report the visit.
In fact, Piwik script makes a separate http call to my server to report any user action, and these calls are listed in my server logs. For these 7 missed pages, the server logs did not include Piwik reporting calls. Apparently, the reason was that the users closed the pages before Piwik script could report.
Piwik can possibly report only those pageviews that lasted long enough for Piwik script to report them which is normally a matter of no more than seconds. This is a quite reasonable approach not to count pages that are closed immediately as visitors apparently considered these pages irrelevant for them. If we take this point of view and exclude these 7 pages then the correct number of viewed pages is 13 vs. 10 reported by Piwik.
Thus, overall Piwik results appear to be quite reasonable.
(Note that Google Analytics take the same approach as they count only those pages that stay open long enough for their script to report it to Google. For this reason the 7 pages that were closed quickly, were most likely not reported to Google either, and the above mismatch between my Google Analytics report and my server logs is probably more than fourfold.)
Update June 4, 2015: I have identified the visitor actions that caused both the misreported page, and the pages missed by Piwik (other than those closed by visitors before Piwik script could report them). I have reported both of them to Piwik as bugs:
Duplicate reporting due to visitor IP change #7772
A sequence of user actions that is missed by Visitor Log #7773
You may consult these links to see what action Piwik has taken on these reports.
When considering which analytics service to install on your site, bear in mind the performance issue.
Google Analytics is hosted on Google website. On your website you just need to include a short Google script that downloads the longer script analytics.js from Google website, and the latter script reports everything to the Google site. The main performance cost is one point off your Google PageSpeed score for the fact that Google Analytics site does not include browser cache header with analytics.js as Google’s other branch, PageSpeed, requires. You can resolve this problem by hosting this script on your site as explained here or here.
On the other hand, if you get the free version of Piwik analytics then you host on your own website. You do not have the browser caching problem as with Google Analytics but rather a direct performance issue: after each page view Piwik script makes an http call to a php file on your website to report it, and execution of php files (php calls) is expensive in terms of server performance. Piwik has a paid version, Piwik Pro, which is hosted on Piwik website; if you go for it then you do not have this performance problem but it costs you money.
I find performance to be more important than analytics and for this reason I have removed both Google Analytics and Piwik from my website and am content with AWStats provided by my hosting provider even though it does not give that clear a picture. (AWStats just analyses my server logs.)