I currently manage quite a few Google Analytics accounts for different websites and am trying to work out how to remove certain Anayltics spam from these accounts. I have previously added filters like excluding Russia visitors as the businesses are local UK based but I am now getting a lot of traffic from:
Language - not set
&
Page - sharebutton.to
If i was to exlucde the above would that get rid of any actual visitors as well as spam or will it get rid of 100% spam?
If someone could help with this that would be brilliant.
Many Thanks
Paul
Filters based on countries or the name of the spam are not efficient because both can be easily changed by the spammers.
Also, it isn't possible to filter the (not set) entries in Analytics, this label is added after the visit is recorded when Analytics doesn't find a value for that dimension.
Instead what you should use
One hostname filter, this will help prevent the majority of the spam, whether it shows as referral, page, language, etc. and independently of the name used by the spammer.
A source filter for the sneaky crawlers which are far less frequent.
Here you will find detailed instructions on how to create the hostname filter and other measures you can take to prevent fake traffic.
I've just set up a tool on a client site that users can use to request a quote from our client. To do this the user lands on a form page, fills in their details, submits and then lands on a thank-you page. Pretty basic.
I set this process up as a goal in Google Analytics, using the destination type goal: "begins with /thank-you" and shared that goal as a conversion in Google AdWords.
I decided to run a few Google AdWords ads to promote the tool. I also wanted to double-check the conversion data that AdWords gives you so I set the destination URL in Adwords to www.example.com/form-page?adsrc=adwords1 (2, 3, 4 etc. for each ad) and I configured the DB so that there was a column that tracked which URL the user was on when filling in the form (this would be the column I counted to get the number of conversions that came from AdWords so I could compare)
Further to this, I made sure that the initial URL parameters that the user landed on were stored in the session so that if the user browsed to other pages and came back to fill in the form later, it would still attribute the conversion to AdWords.
I tested this thoroughly on a staging and production environment and everything was working correctly.
I ran the campaign for a week and when I checked, the conversion results in the Data Base vs the ones coming from AdWords are wildly different. The DB tells me I've had 5 conversions while AdWords gives me 21.
Is there anything in the way Google uses its gclid that may be causing this issue? Or is there a problem with the way I've set up the measurement structure?
This can be caused by few things, but I think this is the GA/AdWords issue, more than your DB/session set-up.
Gclid shouldn't influence your goal, since it is used only for AdWords/Analytics interactions, Goals should not be affected in your set-up.
https://support.google.com/analytics/answer/2938246?hl=en
Probable cause: If your goal set-up only contains "begins with /thank-you", isn't it possible, that you are counting all the sessions which reach thanks-you page? Not just AdWords?
Solution: if you need to count conversions in AdWords (for performance improvements), use AdWords conversion code at the same page, this counts only those users, who clicks an ad and reach your thank-you page in x (default 30) days. Be sure to count only unique conversions (users by cookie).
Differences between GA/AdWords conversion count:
https://support.google.com/analytics/answer/2679221?hl=en
Google attributes conversions to the last marketing channel, where direct visits do not count as a marketing channel (if you look at their attribution flow visualization you see that the penultimate step is to check for existing campaign information for the user). So GA might overcount Adwords visits (or other campaigns) and conversely shows fewer conversions for direct visits.
On contrast your database probably records the last traffic channel without an elaborate attribution model, so it will show less campaign traffic.
Also IIRC the adwords interface records the conversion for the time of the ad click, not the actual goal conversion, so the timeframes for the conversions differ.
I'm doing some complex reports for google analytics and would like to ask you if the following is possible. The client wants to have just organic data for a bunch of metrics. Like pageviews, visitBounceRoutes, etc. The query I ended up with is the following:
https://www.googleapis.com/analytics/v3/data/ga?dimensions=ga:source,ga:medium,ga:keyword,ga:day,ga:month,ga:year&end-date=2013-11-20&fields=columnHeaders/name,rows,totalResults,totalsForAllResults&filters=ga:medium==organic&ids=ga:79067749&metrics=ga:pageviews,ga:pageviewsPerVisit,ga:visitors,ga:avgTimeOnSite,ga:newVisits,ga:visitBounceRate&start-date=2013-10-20
However the response is as follows:
'{"totalResults":0,"columnHeaders":[{"name":"ga:source"},{"name":"ga:medium"},{"name":"ga:keyword"},{"name":"ga:day"},{"name":"ga:month"},{"name":"ga:year"},{"name":"ga:pageviews"},{"name":"ga:pageviewsPerVisit"},{"name":"ga:visitors"},{"name":"ga:avgTimeOnSite"},{"name":"ga:newVisits"},{"name":"ga:visitBounceRate"}],"totalsForAllResults":{"ga:pageviews":"0","ga:pageviewsPerVisit":"0.0","ga:visitors":"0","ga:avgTimeOnSite":"0.0","ga:newVisits":"0","ga:visitBounceRate":"0.0"}}'
Can the dimensions ga:source,ga:medium,ga:keyword be mixed with the above metrics? It seems they can't since if I omit them the API returns an array of values 1 per each day within the specified range.
Where can I find more information about this and what categories are mixable? https://developers.google.com/analytics/devguides/reporting/core/dimsmets just shows all the available metrics but do not explains how they are combined and which one would be valid requests. I'm new at the analytics API and would be great any kind of help or guidance
Thanks a lot
Google Analytics Query Explorer is your friend for playing around with analytics dimensions/metrics/filters ;-)
Try http://ga-dev-tools.appspot.com/explorer/?dimensions=ga:source,ga:medium,ga:keyword,ga:day,ga:month,ga:year&metrics=ga:pageviews,ga:pageviewsPerVisit,ga:visitors,ga:avgTimeOnSite,ga:newVisits,ga:visitBounceRate&filters=ga:medium%253D%253Dorganic&start-date=2013-10-20&end-date=2013-11-20&max-results=100
Some thoughts:
Those dimensions & metrics should work -- maybe there was no organic data recorded during that time range?
Try removing the ga:medium==organic filter and see what your data looks like.
Does the profile you're using (ga:79067749) have any filters on it? If so, maybe try a different profile that has unfiltered data. (Analytics best practices -- make sure you have a profile with no filters applied that captures all data.)
As Mike said, there is no problem with the combination of metrics and dimensions you are using.
If you are entering the URL query directly in the browser problem might be the lack of URL encoding in your query string. For example, you need to convert == to %253D%253D
For example, instead of ga:medium==organic, you need ga:medium%253D%253Dorganic
If you build your query in the Google Analytics Query Explorer as Mike suggests, you can grab the direct link to your report by clicking the link symbol in the upper left:
We have a site that tracks conversions through Google Analytics for redirects to an affiliate. However, not all redirected visitors convert to a sale after they leave our site. Our affiliate reports back to us weekly on who converted (and we can identify an individual user session from that report). Is there a way to get that conversion data back into Analytics? We've got a great coding team, but I just need to point them in the right direction.
Good question Jeff. If you don't mind the accuracy of the timing being off, your team could certainly just step through your site and intentionally trip the conversions.
Other than that, you may look into using a custom solution to bulk import that data using this type of API: Google Analytics for Mobile Websites
This Google Analytic server-side solution supports PERL, ASP.NET, JSP, and PHP. If you're looking for a repeatable process for batch importing GA data, this maybe a viable solution for you.
Hope this gets you going in the right direction.
I would not recommend manually 'tripping' the conversions.
There is no easy way to get the data back into Analytics. And it would depend on your reporting requirements (time lines, etc)
One way to approach this is to set a custom variable that is scoped to a visitor that would identify the visitor in an anonymous way (not personally identifiable manner, beware the privacy policy).
http://cutroni.com/blog/2011/05/05/merging-google-analytics-with-your-data-warehouse/
So when a visitor comes to the site, a custom variable would get set. This variable acts as a key to associate behavior on the site and the affiliates. Once you receive the data about which visitors converted from your affiliates associated to the non-personally-identifiable ID, you can use this to have code fire some conversion events once it recognizes on a separate visit that a visitor with certain custom variables set using the _getVisitorCustomVar()
http://code.google.com/apis/analytics/docs/gaJS/gaJSApiBasicConfiguration.html
Now, I realise the initial response to this is likely to be "you can't" or "use analytics", but I'll continue in the hope that someone has more insight than that.
Google adwords with "autotagging" appends a "gclid" (presumably "google click id") to link that sends you to the advertised site. It appears in the web log since it's a query parameter, and it's used by analytics to tie that visit to the ad/campaign.
What I would like to do is to extract any useful information from the gclid in order to do our own analysis on our traffic. The reasons for this are:
Stats are imperfect, but if we are collating them, we know exactly what assumptions we have made, and how they were calculated.
We can tie the data to the rest of our data and produce far more accurate stats wrt conversion rate.
We don't have to rely on javascript for conversions.
Now it is clear that the gclid is base64 encoded (or some close variant), and some parts of it vary more than others. Beyond that, I haven't been able to determine what any of it relates to.
Does anybody have any insight into how I might approach decoding this, or has anybody already related gclids back to compaigns or even accounts?
I have spoken to a couple of people at google, and despite their "don't be evil" motto, they were completely unwilling to discuss the possibility of divulging this information, even under an NDA. It seems they like the monopoly they have over our web stats.
By far the easiest solution is to manually tag your links with Google Analytics campaign tracking parameters (utm_source, utm_campaign, utm_medium, etc.) and then pull out that data.
The gclid is dependent on more than just the adwords account/campaign/etc. If you click on the same adwords ad twice, it could give you different gclids, because there's all sorts of session and cost data associated with that particular click as well.
Gclid is probably not 100% random, true, but I'd be very surprised and concerned if it were possible to extract all your Adwords data from that number. That would be a HUGE security flaw (i.e. an arbitrary user could view your Adwords data). More likely, a pseudo-random gclid is generated with every impression, and if that ad is clicked on, the gclid is logged in Adwords (otherwise it's thrown out). Analytics then uses that number to reconcile the data with Adwords after the fact. Other than that, there's no intrinsic value in the gclid number itself.
In regards to your last point, attempting to crack or reverse-engineer this information is explicitly forbidden in both the Google Analytics and Google Adwords Terms of Service, and is grounds for a permanent ban. Additionally, the TOS that you agreed to when signing up for these services says that it is not your data to use in any way you feel like. Google is providing a free service, so there are strings attached. If you don't like not having complete control over your data, then there are plenty of other solutions out there. However, you will pay a premium for that kind of control.
Google makes nearly all their money from selling ads. Adwords is their biggest money-making product. They're not going to give you confidential information about how it works. They don't know who you are, or what you're going to do with that information. It doesn't matter if you sign an NDA and they have legal recourse to sue you; if you give away that information to a competitor, your life isn't worth enough to pay back the money you will have lost them.
Sorry to break it to you, but "Don't be Evil" or not, Google is a business, not a charity. They didn't become one of the most successful companies in the world by giving away their search algorithm to the first guy who asked for it.
The gclid parameter is encoded in Protocol Buffers, and then in a variant of Base64.
See this guide to decoding the gclid and interpreting it, including an (Apache-licensed) PHP function you can use.
There are basically 3 parameters encoded inside it, one of which is a timestamp. The other 2 as yet are not known.
As far as understanding what these other parameters mean—it may be helpful to compare it to the ei parameter, which is encoded in an extremely similar way (basically Protocol Buffers with the keys stripped out). The ei parameter also has a timestamp, with what seem to be microseconds, and 2 other integers.
FYI, I just posted a quick analysis of some glcid data from my sites on this post. There definitely is some structure to the gclid, but it is difficult to decipher.
I think you can get all the goodies linked to the gclid via google's adword api. Specifically, you can query the click performance report.
https://developers.google.com/adwords/api/docs/appendix/reports#click
I've been working on this problem at our company as well. We'd like to be able to get a better sense of what our AdWords are doing but we're frustrated with limitations in Analytics.
Our current solution is to look in the Apache access logs for GET requests using the regex:
.*[?&]gclid=([^$&]*)
If that exists, then we look at the referer string to get the keyword:
.*[?&]q=([^$&]*).*
An alternative option is to change your Apache web log to start logging the __utmz cookie that google sets, which should have a piece for the keyword in utmctr. Google __utmz cookie and you should be able to find plenty of information.
How accurate is the referer string? Not 100%. Firewalls and security appliances will strip it out. But parsing it out yourself does give you more flexibility than Google Analytics. It would be a great feature to send the gclid to AdWords and get data back, but that feature does not look like it's available.
EDIT: Since I wrote this we've also created our own tags that are appended to each destination url as a request parameter. Each tag is just an md5 hash of the text, ad group, and campaign name. We grab it using regex from the access log and look it up in a SQL database.
This is a non-programmatic way to decode the GCLID parameter. Chances are you are simply trying to figure out the campaign, ad group, keyword, placement, ad that drove the click and conversion. To do this, you can upload the GCLID into AdWords as a separate conversion type and then segment by conversion type to drill down to the criteria that triggered the conversion. These steps:
In AdWords UI, go to Tools->Conversions->Add conversion with source "Import from clicks"
Visit the AdWords help topic about importing conversions https://support.google.com/adwords/answer/7014069 and create a bulk load file with your GCLID values, assigning the conversions to you new "Import from clicks" conversion type
Upload the conversions into AdWords in Tools->Conversions->Conversion actions (Uploads) on left navigation
Go to campaigns tab, Segment->Conversions->Conversion name
Find your new conversion name in the segment list, this is where the conversion came from. Continue this same process on the ad groups and keywords tab until you know the GCLID originating criteria
Well, this is no answer, but the approach is similar to how you'd tackle any cryptography problem.
Possibility 1: They're just random, in which case, you're screwed. This is analogous to a one-time pad.
Possibility 2: They "mean" something. In that case, you have to control the environment.
Get a good database of them. Find gclids for your site, and others. Record all times that all clicks occur, and any other potentially useful data
Get cracking! As you have started already, start regressing your collected data against your known, and see if you can find patterns used decrypting techniques
Start scraping random gclid's, and see where they take you.
I wouldn't hold high hope for this to be successful though, but I do wish you luck!
Looks like my rep is weak, so I'll just post another answer rather than a comment.
This is not an answer, clearly. Just voicing some thoughts.
When you enable auto tagging in Adwords, the gclid params are not added to the destination URLs. Rather they are appended to the destination URLs at run time by the Google click tracking servers. So, one of two things is happening:
The click servers are storing the gclid along with Adwords entity identifiers so that Analytics can later look them up.
The gclid has the entity identifiers encoded in some way so that Analytics can decode them.
From a performance perspective it seems unlikely that Google would implement anything like option 1. Forcing Analytics to "join" the gclid to Adwords IDs seems exceptionally inefficient at scale.
A different approach is to simply look at the referrer data which will at least provide the keyword which was searched.
Here's a thought: Is there a chance the gclid is simply a crytographic hash, a la bit.ly or some other URL shortener?
In which case the contents of the hashed text would be written to a database, and replaced with a unique id.
Afterall, the gclid is shortening a bunch of otherwise long text.
Takes this example:
www.example.com?utm_source=google&utm_medium=cpc
Is converted to this:
www.example.com?gclid=XDF
just like a URL shortener.
One would need a substitution cipher in order to reverse engineer the cryptographic hash... not as easy task: https://crypto.stackexchange.com/questions/300/reverse-engineering-a-hash
Maybe some deep digging into logs, looking for patterns, etc...
I agree with Ophir and Chris. My feeling is that it is purely a serial number / unique click ID, which only opens up its secrets when the Analytics and Adwords systems talk to each other behind the scenes.
Knowing this, I'd recommend looking at the referring URL and pulling as much as possible from this to use in your back end click tracking setup.
For example, I live in NZ, and am using Firefox. This is a search from the Firefox Google toolbar for "stack overflow":
http://www.google.co.nz/search?q=stack+overflow&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGLL_en-GB
You can see that: a) im using .NZ domain, b) my keyword "stack+overflow", c) im running firefox.
Finally, if you also stash the full landing page URL, you can store the GCLID, which will tell you the visitor came from paid, whereas if it doesn't have a GCLID, then the user must have come from natural search (if URL tagging is enabled of course).
This would theoretically allow you to then search for the keyword in your campaign, and figure out which adgroup them came from. Knowing the creative would probably be impossible though, unless you split test your landing URLs or tag them somehow.