Prevent fake analytics statistics with custom crawler - google-analytics

Is there a way to prevent faked Google Analytics statistics by using PhantomJS and/or a ruby crawler like Anemone?
Our monitoring tool (which is based on both of them) crawls the sites from our clients and updates the link status of each link in a specific domain.
The problem, that simulates huge trafic.
Is there a way to say something like "I'm a robot, don't track me" with a cookie, header or something?
( adding crawler IP's to Google Analytics [as a filter] may not be the best solution )
Thanks in advance

Joe, try setting up advanced exclude filter -- use field Browser and into "Filter Pattern" put down the name of your user agent for phantom (or any other user agent -- look up the desired name in your Technology -> Browser and OS report).

I found a quick solution for this specific problem. The easiest way to exclude your crawler which executes js (like phantomjs) from all Google Analytics statistics is, to simply block the Google Analytics domain through the /etc/hosts.
127.0.0.1 www.google-analytics.com
127.0.0.1 google-analytics.com
It's the easiest way to prevent fake data. This way, you don't have to add a filter to all your clients.
( thanks for other answers )

IP filtering might not be sufficient, but maybe filtering by user agent string (which can be set arbitrarily with phantom) ? That would be the "browser" field in the filters.

Related

Firebase dynamic link with query parameters

I've been looking at replacing all the links in the firebase password reset and welcome emails with something more custom, so it doesn't look terrible for users (so moving from https://some-app-123f.firebaseapp.com to link.some-app.com).
I thought that the best way to do this might be to use the firebase dynamic links, so I set up link.some-app.com in there. All good so far.
I generated a new dynamic link directly in the web interface. This is basically going to be used for everyone, or that is my hope. Let's call that link link.some-app.com/email-link. I have then set this up to point to https://some-app-123f.firebaseapp.com/__/auth/action. Going to the first takes me to the second, all good so far. The links just open the web, not apps, and no interstitial page.
I can replace the "Action URL" in the email template with link.some-app.com/email-link. When I email a password reset, I get a link that looks like this: https://link.some-app.com/email-link?mode=resetPassword&oobCode=[hash]&apiKey=[key]&lang=en
However, when I click on this link in debug mode (adding d=1 to the end), I get a bunch of errors:
The format of parameter (mode) is not whitelisted for this domain.
So I thought that I could solve this by using the whitelisting feature on the link domain in the firebase console, so I've tried a bunch of different options, but these are the two most permissive (to cover both domain bases, though I am pretty sure I need to be whitelisting the target domain i.e. firebase)
^https://some-app-123f.firebaseapp.com.*$
^https://link.some-app.com/email-link.*$
Am I completely missing something? Is this something that just isn't possible because it is redirecting back to firebase?
tl;dr: I'm trying to create an effectively serverless redirect link to the password reset functionality in firebase using a prettier url than firebase gives you out of the box
Your URL patterns are incorrect. You haven't escaped . Your pattern should be
^https://some-app-123f\.firebaseapp\.com/.*$
You don't need to add the second URL to whitelist.
If an improper program from the Dynamic Links prevents redirection to the sites that are beyond your control then you need to whitelist the URLs where the Dynamic Links can redirect to. For more information regarding whitelisting URLs please visit the link enter here .

Ignore Specific Computer Google Analytics

I need a way to remove my own traffic and interaction from my Google Analytics.
I know of all the IP and cookie based solutions but IPs can change and cookies can be erased.
One thing I did think of is that I'm always logged into my Google account and I'm always using Chrome. Is there any way to use this to my advantage? It would be really nice if I could just ignore based on my Google Account.
Browser Extensions
Use a browser extension to prevent you from being tracked on analytics. I use the Block Yourself From Analytics extensions because it allows you to configure the sites you want to prevent traffic on.

How to track RSS feed useage / views?

What's the best way to track how many times items in your RSS have been accessed?
Assuming your RSS is served from a webserver, the server logs would be the obvious place to gather statistics from. There are numerous packages for parsing and interpreting webserver logs.
AWStats is a popular (free) package, and Wikipedia keeps a fairly comprehensive list.
If you serve your feeds through something like FeedBurner then you can also get stats from there including clicks
You could use Google Analytics, but you would need a service to make the correct requests to the Google Analytics API or redirect to it. There are two APIs you can use:
the __utm.gif "API"
the Measurement Protocol API
To use the later (you need Universal Analytics), which is way better in my opinion, you would need to make a request or redirect to something like:
http://www.google-analytics.com/collect?z=<randomnumber>&t=pageview&dh=<domainname>&cid=<unique-client-uuid>&tid=<propertyid>&v=1dp=<path>
Where:
<randomnumber> is a random number to avoid caches (especially if you do redirects)
<domainname> is the domain name you see in your tracking code
<propertyid> is the property id you see in your tracking code (eg: UA-123456)
<path> is the path to the page you want to register the pageview for. Note that it must be quoted. Eg, for /path/to/page you would need to send %2Fpath%2Fto%2Fpage
I've implemented a simple redirector service that does exactly that here (explained at length here)
If you're stuck with the Classic Analytics then you would need to use nojsstats or the older implementation

Does anyone use Google Analytics? How Google does it to avoid counting the owner of the website as visitor?

I don't want to be counted as visitor every time I test my page in the hosting. Does Google know i'm the owner of the site by checking if i'm logged in my Gmail account?
I don't think Google does anything like this automatically. But they do provide instructions for excluding based on IP address (or range) and apparently also now by cookie. If you use a CMS or admin interface, you could put the code they provide in an HTML file that you then include into the admin interface pages by IFRAME (to ensure that the cookie stays set for anyone who uses that interface).
One option is to install Ghostery addon your browser. Ghostery can block trackers and scripts used on webpages likes google analytics, google adword and other adwares.
You can also block or unblock the trackers for a specific site or specific tracker for a particular site.This add on is available for Firefox and chrome browsers. If you have this installed on your browser, your visit wont be counted as google analytic script wont be executed.
You can learn more about ghostery at: http://www.ghostery.com/about
There are also often application specific ways of blocking google from counting administrators. For example I've used a wordpress analytics plugin that would automatically not include the tracking code if the user was logged in as an administrator. If you are application has the concept as admin then you could write something similar that controls when the code is added.
If you visit your site frequently from connections with a dynamic IP address, eg. home broadband, then excluding IP addresses is not particularly practical. To go beyond IP exclusion, you can create an isolated page on your site that only you know about that includes a call to Analytics to label your cookie.
The Google Analytics _setVar() function lets you label yourself with an arbitrary string, eg. 'internal'. You only need to do this once per browser as long you don't clear your cookies.
Having labelled yourself as 'internal', you can create an Advanced Segment within Google Analytics to exclude visitors with that label.
Google Analytics relay on you embedding a call to their JavaScript see this link - do not confuse it with how Google does page ranking.
So the answer to your question is that your pages should be smart enough to recognize when the request comes from you and skip the call to the JavaScript.

Can I filter out my traffic in google analytics?

I have a site running Google analytics and I end up being a large fraction of the traffic to it (like 1 of the 2 hits per day). Is there any way I can set it so that my browsing doesn't skew the numbers so much? I'd be happy if it just didn't record anything for accesses that are logged in as my Google account.
Use the Filter Manager in your analytics settings
http://www.google.com/support/analytics/bin/answer.py?answer=55481&cbid=-1j8it19c4uzvt&src=cb&lev=answer
You can use filter to exclude
Traffic from a a domain
IP address
Sub directory
or you can use a custom filter. You can edit your site to set a campaign code if you login in and use the custom filter to exclude that campaign code.
You can also try out the ip filter if you use the same machine.
One option would be to use an ad blocking or javascript disabling extension in your browser to prevent google analytics from being loaded.
A neat solution is to simply stop the tracking javascript from being sent to the browser based on a cookie set on your machine. This can be done by simply adding a few lines of code to your page. Take a look at this article for a full explaination.
If you login there logged in as a site user, maybe you basing on this you just do not put the JavaScript for Google Analytics in the output HTML. This is a typical case when you are an administrator and you do not want to mess the results basing on your activities.
If you are able to touch the code that runs your site I think this is the simplest way to go.
If it is not the case, please provide some more details.

Resources