Pages

Pages

Friday 28 July 2017

How to Stop Crawler and Ghost Spam in Google Analytics




How to Stop Crawler and Ghost Spam in Google Analytics

On my previous post general rules for creating a search engine friendly website, I listed things that will help make your website search engine friendly. And today we will learn how to stop crawler and ghost spam in Google Analytics. 

One of the most useful Google Analytics features helps webmasters track the referral URLs, from which people land on their website.

Unfortunately spammers utilize this possibility to promote their own websites or damage the image of unrelated companies by inserting their domain names as referrer URLs in your Analytics data.

There are two main types of spam that can utilize this functionality - Ghost spam and Crawler spam.

What is Ghost Spam and How to Filter it Out

Spammers that use this method don't actually visit your site. Instead, they use the Analytics Measurement Protocol which allows people to send data directly to Google Analytics and add malicious information through it. Usually, they randomly generate Analytics tracking codes (UA-XXXXX-1).

This way they leave fake data in your account without any trace.
Since ghost spammers don't know to which domain they are sending data to Google Analytics, they are either using their hostname, that has nothing to do with your site, or they don't specifying a domain name at all.

You can easily get a list of the valid hostnames that should be sending information to your account. Usually, that would be only yourdomain.com plus all subdomains you have like blog.yourdomain.com or even www.yourdomain.com.

To get this information, go to Google Analytics -> Network and select the Hostname option as Primary Dimension. Then, you need to create a list of all your hostnames using Regular Expression like this:
yourdomain \. com | blog\. yourdoma
 Add all hosthames that actually belong to you into that list and store them in an empty text file. We will need them in a moment.

Next, click on the Admin link on top of the site, select your account, the respective property and view for the site in question, and click on the Filters -> Add Filter menu. This will allow you to create a new filter.

Set a name for it, like "Ghost spam filter", then choose custom filter type. Now select hostname from the filter field dropdown and click on the
Include radio button. Finally, add the list of valid domains in the Filter Pattern field. It's good to click on the verify this filter link before activating it. Once you do that, you will see the information that will be removed from your Analytics data. Finally, click on the save button and you're done!

How to Stop Crawlers Spam

Stopping crawlers spam is easier because they actually access your site. You can get their hostnames the same way you got your valid hostnames.

This time, get all the suspicious ones that have strange / unexpected Source and Hostname different than yours. Then, you can block them by using the following .htaccess rule:
  • ## STOP REFERRER SPAM
  • RewriteCond % { HTTP_REFERER } sp
  • RewriteCond % { HTTP_REFERER } bu
  • RewriteRule . * - [ F ]
Note, that .htaccess files are sensible and you have to escape the dot character with a slash. For example, if you want to block spam-bot-site.net, you need to add spam-bot-site\.net in the rules.
Hope this helps?
Feel free to comment below




No comments:

Post a Comment