I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, but I cannot see them in my custom designed table where I insert all the visitors to the site, so I think that they only manipulate the GA code, never reaching the site itself.
If you follow their link, they redirect you to some affiliates link.
I don't know whether they have impact on my SEO/SERP, but I would like to get rid of them. May I do that via htaccess file?
One peculiar aspect is that I get visitors from different forum like pages. E.g.: forum.topic221122.darodar.com, forum.topic125512.darodar.com etc., so I would like to block the full darodar.com domain.
Besides darodar.com, there are also econom.co and iloveitaly.co that are bothering my stats. Can I block them all from htaccess?
Most of the Spam in Google Analytics never access your site so you can't block them using any server-side solution.
Ghost Spam hits directly GA and usually shows up only for a few days and then disappear, that's why some people think they blocked them from the .htaccess file but is just coincidence.
This type of Spam is easy to spot since they use either a fake hostname or is not set. (See image below)
The other type, Crawlers like semalt, actually access your site and can be blocked from the .htaccess file, however, there are just a few of them.
So in summary, to stop spam in Google Analytics:
Crawlers: server-side solutions or filters in GA
Ghosts: ONLY filters in GA
The only efficient solution to prevent being hit by ghost spam is by making an include filter with all your valid hostnames.
First you need to make a REGEX with all the valid hostnames, something like this (you can find them on the network report)
yoursite\.com|shoppingcart\.com|translateservice\.net
These are some examples; you might have more or fewer hostnames. Once you have the REGEX, follow the same steps as above and change this:
Go to the admin tab in Google Analytics
Select FILTER under the View Column > New Filter
Filter type Custom > Include > Filter Field Hostname
File Pattern Copy the hostname expression you built
For Crawlers you will have to create a different filter building an expression with all spammers
spammer1|spammer2|spammer3|spammer4|spammer5
Filter type Custom > Exclude > Filter Field Campaign source
File Pattern Copy the referral expression
Everytime you work with filters it is important that you keep an unfiltered view.
If you need detailed steps for this solutions you can check this complete guide about Spam in Google Analytics.
Guide to stop and remove All the spam in Google Analytics
Hope it helps.
Hostname report Example
This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.
Yes you can block with .htaccess and actually you should do it.
Your .htaccess file could look like this:
<IfModule mod_setenvif.c>
# Set spammers referral as spambot
SetEnvIfNoCase Referer darodar.com spambot=yes
SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
## add as many as you find
Order allow,deny
Allow from all
Deny from env=spambot
</IfModule>
When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).
They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.
Google Analytics should prevent this, the same way GMail prevents spam email.
According to this entry, they are never visiting your site, they are faking HTTP request to GA using your UA-code. So, it seems it's pointless to block them using .htaccess or any other method, because they never actually enter to your site, they are only sending fake "visit" data to Google.
We have found that using htaccess is a good way to stop these spams. I have implemented below solution on my clients site which is working really well so far.
Best way is to stop them by contains clause, e.g. spam priceg.com check for priceg in referrer url.
Because many of these sites are creating sub domains and re hitting and when they tweak the url, hard coded conditions fail
RewriteCond %{HTTP_REFERER} (priceg) [NC,OR]
RewriteCond %{HTTP_REFERER} (darodar) [NC,OR]
It is explained in detail here
apparently, this is done by a spammer by communicating directly with google analytics using your website's account ID. So they effectively tell google analytics they visited your page while in fact they never did. They identify themselves to analytics by means of an URL which THEY WANT YOU TO VISIT. So you see their traffic in google analytics and go check them out. They will have an amazon affiliate account hooked up and so they attempt to get a commission from your amazon purchases, for example.
so .htaccess did nothing for me when I was fighting this one; you need to create a filter which filters out things like (.*)/.darodar/.com
the real bad effect I have found from this is it invalidates my website statistics
You can restrict access use .htaccess or by filtering ALL robot visits from being tracked by Google Analytics. If that doesn't work, setup Google Analytics filtering. More details on how to do that can be found here: http://www.wiyre.com/google-analytics-darodar-forum-spam-what-is-it/
They are Russian based but routing their spiders through China and the Philippines. Maybe it would be best to block the whole IP address at this point, they have multiple sub-domains.
Blocking any bots at your web server level makes no sense - spammers are sending fake requests to Google Analytics web server. All they have to know is website domain name and Google Analytics ID linked to it.
So you have to mask your Google Analytics ID at website code. For example, you can do like this at Google Analytics JS code:
ga('create', 'UA-X' + 'XXXXX' + 'XX-X', 'auto');
Spammer's bot should be able to execute JS code to parse your Google Analytics ID after this change (and not so many bots will be able to do it).
https://nobodyonsecurity.com/security/fighting-google-analytics-referrer-spam
.htaccess is not the best way. In my site I use GA, The option tracking information and then Reference exclusion list.
Regards!
Lunametrics posted a nice article to solve this issue using Google Tag Manager:
http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/
I think that the most effective way to avoid ghost spam is to add a custom dimension that let you know the site was indeed visited, because as we know they never visit the site.
ga('set', 'dimension1', "Hey I'm really here!!");
ga('send', 'pageview');
You should simply add this lines in your pages and then add a filter to "include" only when the dimension has the expected value ("Hey I'm really here!!") in this case
I used these mod_rewrite methods for semalt:
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]
or with the .htaccess module mod_setenvif
SetEnvIfNoCase Referer semalt.com spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot
I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:
https://github.com/Stevie-Ray/referrer-spam-blocker/
Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.
https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/
2019 update
I may have a solution to this problem as I find none of the other solutions to be effective.
Let me address the problems of the existing solutions first
Add a filter for each referrer spam domain.
How many domains will you add?
Most of these referrer spam domains exist for sometime and
then disappear
Maintain a blacklist of referrer spam domains.
This gets even more complicated as they are basically endless in numbers.
You would have to keep updating the blacklist.
Also bigger the blacklist, the more time you need to scan it
Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing
How do these bots work?
First, it is crucial to understand how these bots work
They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website
I believe I have a solution that offers the following advantages
No need to maintain whitelists and blacklist
Will work against 99% of them easily and can always be modified to take it to 100%
Requires almost NO manual intervention
The idea is to NOT have a tracking ID at all in the script
Here is an example
script.
//- Google Analytics ID
var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];
var newScript = document.createElement("script");
newScript.type = "text/javascript";
newScript.setAttribute("async", "true");
newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
document.documentElement.firstChild.appendChild(newScript);
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
// Feature detects Navigation Timing API support.
if (window.performance) {
// Gets the number of milliseconds since page load
// (and rounds the result since the value must be an integer).
var timeSincePageLoad = Math.round(performance.now());
console.log(timeSincePageLoad)
// Sends the timing event to Google Analytics.
gtag('event', 'timing_complete', {
'name': 'load',
'value': timeSincePageLoad,
'event_category': '#{title}'
});
}
We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array
Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID
The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots
Related
and its URL is 'secured' with SSL (with httpS://mywebsite.nl).
However, I found out that, for a long time, at Google Analytics, I use http://mywebsite.nl, ('non-secured') at my property and view's 'Default URL'.
I have two questions:
Did I miss data because I used http instead of https in the property and view's Default URL?
Can I CHANGE the http to httpS (in Google Analytics property/view) without problem, or do I lose historical data because of that? (This probably also depends on answer of Q1...) Or should I ADD a new property and/or view with https Default URL?
Thanks!
you didn't
you don't lose the historical data, feel free to change it.
That "default url" is for your convenience. you can do anything with it. That's just what GA uses to form full URLs from page paths only. Instead of using the hostname dimension there.
Also, GA is gracious enough to warn you whenever you can do significant changes to your core data.
I have a "short URL" MVC website that takes in an identifier and redirects the user to the end resource based on that identifier.
Prior to the redirect I create a web request to Google Analytics (GA) to track which identifiers are commonly used. I would also like to track some more information using campaign/source/medium options in GA but I'm having a tough time getting these to show in the reports - the link I'm using is below (utmac switched out for obvious reasons):
https://www.google-analytics.com/__utm.gif?utmdt=ShortUrl+Redirect&utmp=%2f3f20118&utmac=UA-9999999-99&utmcc=__utma%3d.1675621744.1591667140.64981.1591667140.64981.1591667140.64981.1%3b%2b&utm_source=TestSource&utm_medium=TestMedium&utm_campaign=TestCampaign
I see the hit but it shows with a medium of "(none)" and source of "(direct)" when I'm expecting to see "TestSource" / "TestMedium"... Is it that I'm constructing the URL wrong or a miss in GA setup?
I've also tried putting the utm_source/campaign/medium as part of the utmp address as query string values but no luck: https://www.google-analytics.com/__utm.gif?utmdt=ShortUrl+Redirect&utmr=0&utmp=/3f20118?utm_source=TestSource&utm_medium=TestMedium&utm_campaign=TestName&utmac=UA-9999999-99&utmcc=__utma%3d.29260146.1591743271.10202.1591743271.10202.1591743271.10202.1%3b%2b
You have to add utm parameters to the querystring of website URL.
Our website is a vertical search engine and we refer a lot of traffic offsite to partners sites.
We recently switched our website over to serve all traffic via HTTPS. We realised this might confuse some of our partners if they were looking at referrer stats and saw a drop in traffic attributed to us. Therefore at the same time, we added the content-security-policy:referrer origin header and we can see that the referrer is correctly passed along by the browser.
Generally this is working fine but we have had complaints from users of Adobe SiteCatalyst (previously Omniture) who are no longer able to attribute traffic as being referred from us. We don't have access to SiteCatalyst to test this out. How does SiteCatalyst track referral traffic and is there a way to view all traffic split by different sources/referrers?
I don't know if this accounts for everything, since I don't have full context on both your end or your users' end, but here is some info / thoughts that might help.
By default, Adobe Analytics tracks referrer from document.referrer. This can be overridden by setting s.referrer.
In general, depending on how your site directs visitors to the other site vs. Browser security/privacy settings, document.referrer may or may not have a value. For example, Internet Explorer's default security/privacy settings is to suppress document.referrer on dynamically generated popup windows (e.g. window.open() calls).
So, and again, this is just speculation because I don't know the full context, you may need to work something out w/ your users, e.g. explicitly passing the referring url as a query param to the target page, and have your users pop s.referrer with it if it exists. Something along the lines of:
if ( !document.referrer ) {
s.referrer=s.Util.getQueryParam( 'refURL' );
}
Note: s.Util.getQueryParam is a utility function for Adobe Analytics AppMeasurement library that will return the value of the specified query param, or an empty string if it doesn't exist. If your users are still using legacy H code, they should use the s.getQueryParam plugin instead. Or use whatever homebrewed method of getting a query param from the URL, since javascript doesn't have a built-in function for it.
I'm trying to set up cross domain tracking between two totally different Domains (not sub-domains). Looking through different pages of Google's documentation seem to give me different suggestions for what to put in the _setDomainName method.
I can't figure out when I'm supposed to use which of these three:
_gaq.push(['_setDomainName', 'mysite.com']);
_gaq.push(['_setDomainName', '.mysite.com']);
_gaq.push(['_setDomainName', 'none']);
Can anyone out there give me some guidance or an explanation?
Ben, the best explanation is on the Google Documentation page - http://code.google.com/apis/analytics/docs/tracking/gaTrackingSite.html#domainToNone. Get to know this page, there are a lot of ways to configure your GA setup and there is no definitive way of saying 'this is how you need to setup cross domain tracking' without knowing a lot more about your desired configuration. The scenarios on that page should certainly help.
There are 3 distinct reasons for using the different variations of _setDomainName.
'none' - you only need to use this feature when you want to track a top-level domain independently from any of its sub-domains, since this parameter will make the cookies of a domain inaccessible by its sub-domains.
'mysite.com' - Use this when tracking between a domain and a sub-directory on another domain. For example, your 'mysite.com' profile should also record hits from 'yourblog.othersite.com'.
'.mysite.com' - Use this when you want track across a domain and its subdomains. This will treat top- and sub-domains as one entity and track in the same profile. For example, 'mysite.com' profile should record 'blogs.mysite.com' and 'shop.mysite.com'.
I recommend setting up some test profiles and experimenting with your configuration, that way you don't 'dirty' your real data.
Hope this helps!
The Docs pages are a little behind, because there were some recent changes that changed the best way to do it.
The default settings for _setDomainName is 'auto'. This will set the cookie to your full domain, unless you're on the www domain, in that case it sets to the mysite.com without the leading dot. This settings can cause problems, and I avoid sticking with them. I allways change it
There are 2 options of setting a domain name for www.mysite.com.
_setDomainName('.mysite.com') -> This is necessary when you want to track all the subdomains as well.
_setDomainName('www.mysite.com') -> You should use this one if you don't want to track your subdomains.
In 99% of the cases I go with the first option. Setting it for the top domain but using the leading dot.
You'll see a lot of people advocating against the leading dot. Like this old but good post from roirevolution. The concerns around the leading dot is that it can cause cookie resets. But it only happens if someone already have the cookie. If this is anew implementation you don't have this problem.
_setDomainName('none') is equivalent to _setDomainName('auto') + _setAllowHash(false). But since _setAllowHash(false) was deprecated I guess _setDomainName('none') should be deprecated as well.
If it is cross domain
tracking,_gaq.push(['_setDomainName', 'mysite.com']);
or
_gaq.push(['_setDomainName', '.mysite.com']);
does not make any difference, as cookie information is not shared across these two different domains any way.
I've set up tracking between domain 1 and domain 2, one way tracking. Initially as suggested in Google Analytics I've added _gaq.push(['_setDomainName', 'none']); on both of the domains. It was for new Google Analytics A/B testing, verification for A/B testing resulted in error. So, I removed _gaq.push(['_setDomainName', 'none']); from domain 1 and left it on domain 2 as it is and it worked perfectly fine.
I've documented it here.
I thought mysite.com will track across that site and its 1st level subdomains (like mysite.com and cats.mysite.com), and .mysite.com will track across that site and its 1st AND 2nd level subdomains (like mysite.com and cats.mysite.com and store.cats.mysite.com).
I base that off what google and some other articles say.
none will disallow any subdomain tracking (so I assume it sets it to set the cookie's domain to: www.mysite.com).
Im guessing the default option auto will set it via document.domain to www.mysite.com (but maybe mysite.com if not on the www domain based on Eduardo's answer above/below to allow allow smarter 1st level subdomain tracking).
The auto/none are guesses from me, not sure what it would put for the domain if you came on blah.mysite.com and had those none/auto options set.
I need to ensure that my webpage is always within an iframe owned by a 3rd party. This third party refers to our landing page using src="../index.php".
Now my question is, if I make use of referrer to ensure that the page was requested by either myself or from the third party and if not force a reload of the 3rd party site, are there any big gotchas I should be aware of?
For example, are there certain common browsers that don't follow the referrer rules?
Thank you.
Also, it's REFERER because it somehow got misspelled in the spec. That was my very first REFERER gotcha.
You can't use referrer to "ensure" that the webpage is always being called from somewhere else because of referrer spoofing.
Referrers are not required. If a browser doesn't supply it then you'll get yourself into an endless redirect loop. Referrer is effectively "voluntary" just like cookies, java, and javascript.
Although. You could keep a log of IP & time last redirected. Prune the logs for anything over 5 minutes old and never redirect more than once per 5 minutes. You should catch 99.9% of users out there but avoid an infinite redirect loop for the rest. The log cannot rely on anything in the browser (that's the original problem) so no cookie and no session. A simple 2-column database table should suffice.
The only way you could do this is to directly authorize the request because of referrer manipulation..
You could restrict requests to a set of IP addresses, if you want to be lax, or require that the including client/system has an authentication cookie for requests shown in the iframe.
Good Luck
Even well-known formats may change...
Google apparently has changed its referrer URL. April 14, 2009, An upcoming change to Google.com search referrals; Google Analytics unaffected:
Starting this week, you may start seeing a new referring URL format for visitors coming from Google search result pages. Up to now, the usual referrer for clicks on search results for the term "flowers", for example, would be something like this:
http://www.google.com/search?hl=en&q=flowers&btnG=Google+Search
Now you will start seeing some referrer strings that look like this:
http://www.google.com/url?
sa=t&source=web&ct=res&cd=7
&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm
&ei=0SjdSa-1N5O8M_qW8dQN&rct=j
&q=flowers
&usg=AFQjCNHJXSUh7Vw7oubPaO3tZOzz-F-u_w
&sig2=X8uCFh6IoPtnwmvGMULQfw
(See also Google is changing its referrer URLs from /search into /url. Any known issues?)
Be aware that Internet Explorer (all versions) specifically OMITS the HTTP REFERRER whenever a user navigates to a link as a result of JavaScript. (bug report)
e.g.
function doSomething(url){
//save some data to the session
//...
location.href = url;//IE will NOT pass the HTTP REFERRER on this link
}