Beware of Ghosts in Your Analytics! How to Manage Google Analytics Spam
I was recently working with a client on an award submission for a website developed by our team. One of the submission requirements was to demonstrate the results and impact of the project.
As we started diving into the referral source data, we noticed that there were several referring sites with which we were not familiar. Why were visitors arriving from sites like traffic2cash.xyz and free-social-buttons.xyz?
To showcase the lift the site was receiving from improvements to the content and user experience, we started diving into the Google Analytics data. And good news, the top-level data was looking great! Traffic numbers were solid, all of the primary content was receiving quality views, and there was a nice spread of visitors from different referral sources. Then we dug a bit deeper.
As we started diving into the referral source data, we noticed that there were several referring sites with which we were not familiar. Why were visitors arriving from sites like traffic2cash.xyz and free-social-buttons.xyz? (See more examples in the image on the right.)
Well, there were no people coming from those sites. The visits were actually nonexistent ghost visits or ghost spam. And those ghosts were haunting our traffic numbers.
Seriously!? How did these ghosts get into the machine?
This bad / ghost data exists because spammers are putting it there and there are several reasons why and how they do this.
Why would someone want to spam the analytics system?
Advertising is the primary reason. Google Analytics and other similar systems are used by millions of sites around the world and are accessed by millions of users. That’s a lot of potential link advertising from curious analytics viewers who want to investigate the odd sites that show up in their web traffic reports. Often the links look like SEO services – so analytics users are a great target market.
There is also the fact that some hackers are just malicious. They hack for the sake of hacking. To them, it's a challenge and an accomplishment to disrupt a high profile system from an international tech giant like Google.
How is the spam getting into the system?
Two common methods for getting the data into an analytics system are by sending fake traffic to a site or by sending data directly into Google Analytics.
- Sending traffic to a site - crawler spam – Hackers create programs that browse/crawl websites. That action of browsing a site, leaves impressions of visitor traffic in an analytics system. Included in the crawl/visit to the site is information that indicates the domain from where the traffic is originating - the referral source. That referral source information is part of what is read and recorded by Google Analytics.
Sending data directly to Google / analytics spam / ghost spam – Google has a protocol that allows developers to send raw user interaction data directly to the Google Analytics servers. It’s a great tool for website operators to get very granular on how they want to measure and view online behavior and interactions on a website. Unfortunately, spammers have figured out that by grabbing or guessing the Tracking IDs that identify each individual website to Google, they can use this protocol to inject garbage data directly into analytics without having to touch or even know the website to which the IDs are associated.
The IDs can be guessed by following the ID format of “UA-000000-01” and programmatically plugging in possible character variations until one works.
IDs can also be grabbed by scanning the source code on a website to look for that same UA code.
Either way, once a spammer has a valid Google Analytics ID, they can use it indefinitely.
Can The ghosts be busted?
The bad news is that not much that can be done to stop the spam from happening. This is an ongoing battle that Google fights. I’m sure that improvements or changes can be expected from Google in regards to how this issue is addressed in the near future.
The good news is that the bad data can be quickly and easily stripped from reports. A straightforward method for excluding the spam is to create a filter to show traffic only for the ‘Hostname’ of the site being analyzed. Hostname can also be used as a secondary dimension when viewing reports in Google Analytics as a quick way to identify any spam that currently exists.
Here are two screenshots from Lyquix’s Analytics account that show how to use Hostname to view existing spam and filter it out:
View spam in Google Analytics by using Hostname as a secondary dimension in reports
Create a filter using hostname to remove spam from Google Analytics Reports
Hopefully this ghost story prepares you to calmly and quickly bust any ghosts that you find in your next analytics report. Happy Hunting!
Additional Reading/Resources related to Analytics Ghost Spam
- The What, Why and How of Google Analytics - Ohow.com
- Myths about the Spam in Google Analytics - Ohow.com
- Ultimate Guide to Removing Google Analytics Spam and Other Irrelevant Traffic - Ohow.com
- Geek guide to removing referrer spam in Google Analytics - Optimizesmart.com
- Stop Ghost Spam in Google Analytics with One Filter - moz.com
- Quick Fix for Referral Spam in Google Analytics - distilled.net
- Google Analytics on More Than 10 Million Websites - marketingland.com
- The Google Analytics Measurement Protocol