I'm doing custom-rolled view tracking on my website, and I just realize that I totally forgot about search bots hitting the pages. How do I filter out that traffic from my view tracking?
Look at the user-agents. It might seem logical to blacklist, that is filter out all the strings that contain "Googlebot" or other known search engine bots, but there are so many of them, it could well be easiest to just to whitelist: log visitors using a known browser.
Another approach would be to use some JavaScript to do the actual logging (like Google Analytics does). Bots won't load the JS and so won't count toward your statistics. You can also do a lot more detailed logging this way because you can see exactly (down to the pixel - if you want) which links were clicked.
You can check the user agent: here there is a nice list.
Or you could cross-check with the hits on robots.txt, since all the spiders should read that first and users usually don't.
Related
Essentially, I'm concerned that a single user can be counted twice. Is there a best practice, etc. I've tried googling and I'm not sure if I'm just not asking the right question with the right words. Platform is on sitecore.
Using the same property to track AMP and non-AMP pages will result in multiple users. See here for Google's recommendation.
Though looks like you can use the Google AMP Client ID API to work around this.
So I've been working on a website for a while. GA account has been up for a couple months but I waited for the website to be finished before putting up the actual JS tag.
In the meantime, the website is being HTTP password restricted (basic authentication) so it isn't even accessible unless you know the user/pwd combination.
To my surprise, I realized today that GA has logged several hundred views to the root of my website. Paths are mostly things like:
/
/?from=http://social-widget.xyz/
/?from=http://www.traffic2cash.xyz/
Bounce% and exit% both at 100% for all of them.
I realize this looks like referral spam, and there are ways to prevent it. Came across this upon googling:
http://botcrawl.com/block-social-widget-xyz-referral-spam-in-google-analytics/
My question is: how can GA log anything anyway when no tag is up and the website isn't even accessible?
Thank you very much in advance
Because it's spam. They hit Google Analytics directly with random GA codes and don't even go through your website.
GA can't tell if these are real hits (from website visits) or fake hits (from spam bots who hit GA directly calling the same ode as they would if on the website). Though arguably they should do more about this.
Massively annoying - particularly when first starting out as this can be a heavy proportion of your "traffic".
It's easy to set up a filter rule is to catch a lot of this by filtering on hostname. As they are randomly hitting GA and don't even know what website they are hitting GA for, they don't usually set this correctly. Real traffic should only come from yourwebsitedomain.com so add a filter for that.
STRONG piece of advice: abandon the default UA-########-1 tracking code of your new website -- simply do not use it!
Create a second and third property on the Admin screen, then use the tracking code for the third property. You will immediately see a lot less spam. No filters or segments necessary!
If you want the whole sad story about spam visits in GA, I have been maintaining the Definitive Guide article for over a year now:
http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/
What people are doing is basically taking the UA-XXXXXX code that you normally get with analytics, and they are generating calls against it. This is skewing my analytics stats. On top of that, in Google WebMaster tools, it's also causing this:
It looks like somehow these pages, with my code on or at least with the generated code on, is making Google Webmaster tools think I have lots of 404's. This can't possibly be good for my rankings.
Anyone know if there is anything you can do to stop this?
Try making async call from your server end using CURL.That way you will never expose your GA code.
I have not implemented it, but it might work as per theory
Since you can filter by custom dimensions you can set a "token" in a custom dimension on every page and filter out any traffic in your view settings that does not include the token.
Obviously this will not help against people who use the code from your website (unless you also implement shahmanthan9s suggestion - which is a lot of work but will give you cleaner data), but it will work against drive-by shooters who randomly select UAIDs to send data to (which is the situation you refer to in your comment).
General setting
I have a website, which uses regular and encrypted urls. Now I want to track the pageviews or all pages the same way.
I have regular URLs like this:
/library.dll?page=page12&arg1=0&arg2=some&session_id=7892734
and special pages like this:
/library.dll?page=specialpage&arg1=0&arg2=some&session_id=7892734&id=page13
aswell as encrypted URLs like this, which are also containing the session id:
/library.dll?page=encrypted&args=gYZEI7lnRAQLzVXdtdbcral8.cOoc6NDtMUGY2yep9wO3JM
So the interesting niformation is always the page, which is in this examples page12, page13 and page14 (where page14 is also part of the encrypted string).
Clarification
I can change the HTML and JS code only. I have no access to the Google Analytics interfaces at all. This will be administrated by multiple customers.
The GA code will be integrated within a template using a customer-specific code and their unique tracking id.
Problem description
I need to track the page argument, because this is basically the interesting part of the url. When tracking other url parameters I cannot accumulate the pageviews for a certain page, because Google Analytics shows them as separate pages.
In addition I don't see any way to track the pageviews with encrypted URLs, because I cannot set a generic name for them unless there is a way to utilize the method ga('send', 'pageview');
Solution idea
I read about overwriting the pageview attributes like this:
ga('send', 'pageview', '/my-overridden-page?id=1');
in the article on page tracking #Google Developers
Utilizing (event) triggers is in my opinion a pretty bad idea.
The question itself
Is there any smarter way to track this information? Is extracting the page-information and overwriting the pageview attributes the best way to do this?
I just started using GA and have kind of no idea how to do this any other way.
You could use filters (custom advanced filters) to rewrite the request url inside the google admin interface (admin->views->filters). This has the advantage that you do not need to change your site/application code.
However using filters will require multi-step-filters with heavy use of regular expressions, and you would have to test this in a "staging" view first (because a wrong filter will permanently mess up your data).
Passing a custom url to the pageview tracking is pretty straightforward and can be tested immediately via the real time view. In my opinion this is indeed the smartest way to do this.
Looking at the documentation on the web, it seems to be a common practise to track outbound links as a virtual pageview with a URL like /outgoing/{original_url}. But a lot of that documentation is from before Google added events to analytics. Which is the preferred method nowadays - page views or events?
The 'correct' way is to track outbound links, downloads, etc. as events. - Creating virtual pageviews is a hack, from back when events wasn't released.
Virtual pageview tracking artificially inflates the number of aggregated pageview, and thereby polutes the data, so best-practice is to avoid this if possible.
However, there are cases where virtual-pageview-tracking is the only solution, and thats when you need to track the outbound link (or download etc.) as a goal - and thereby being able to optimize against this goal in AdWords.
Examples include AdWords optimization with regard to PDF-download.
If this is not the case, use event-tracking.
--
A standard snippet is (which simply is included in the specific <a>'s onclick-attribute):
_trackEvent('Outbound link', 'Click', 'http://www.external-link.com', 0)
Google has another solution to this:
http://www.google.com/support/googleanalytics/bin/answer.py?hl=en&answer=55527
Which tracks the events, waits 100ms and the redirects to the external link - this imo, is not the best solution.
--
Another thing to remember, is that the onclick-event is not fired when the user right-clicks -> open in tab, or the equilivant middle-click.