Switching to cross domain tracking from previous Google Analytics implementation - google-analytics

We have in place a simple implementation of GA and have had for some time, the only additional methods we use are setVar and setSessionTimeout. Is there a way we can switch to a cross-domain tracking configuration of GA (where certain links are wired via the _link method) without losing existing tracking data on client systems?
I've run a lot of tests and the more issues solved, the more issues that come up. In a nutshell:
Pre-implementation, the client has these cookies: __utm(a, b, c, z, v). The first step was to change the code and add the _setAllowLinker and _setAllowHash methods, but this was throwing a TypeError. I found this could be avoided by deleting the __utmv cookie before calling the pageTracker methods, and then calling _setVar again afterward.
The new code in place seems to be working OK without throwing an error:
document.cookie = '__utmv=; expires=Tue, 22 Jun 2010 11:57:00 GMT;'+
' path=/; domain=XXXXXXX';
var pageTracker=_gat._getTracker(UA-XXXXXXXX);
pageTracker._setAllowLinker(true);
pageTracker._setAllowHash(false);
pageTracker._setSessionTimeout(XXXXX);
pageTracker._setVar(XXXXX);
pageTracker._trackPageview();
The cookies are now updated to not use a hash value, so their values can now be used cross domain, but the problem is that the values in the __utm cookies have been refreshed with new values which means we're losing user history (and new visits will explode).
For example, __utma:
Before - XX-HASHVALUE-XX.1379282990.1277294951.1277294951.1277294951.1
After - 1.26318765.1277294984.1277294984.1277294984.1
If it's not possible to switch to cross-domain GA configuration without losing user history, is there a way to fake it on the link which will click through to the next domain. That is, constructing the link URL from the cookies and replacing all the hashvalue prefixes with a 1?
Thanks!

Unfortunately it seems there is no proper way to do this using the ga.js API. I've gone with this solution:
var pageTracker = _gat._getTracker("UA-123456-7");
if (getCookie('__utma') && getCookie('__utma').substr(0, 2) == '1.') {
// hash value safely removed, flick GA API switch
pageTracker._setAllowHash(false);
}
pageTracker._trackPageview();
if (getCookie('__utmc') != '1') {
// remove hash values from all GA cookies
eraseCookieHash();
}
In the eraseCookieHash function, each cookie is updated manually to replace the hash value with a 1, using the guide at http://code.google.com/apis/analytics/docs/concepts/gaConceptsCookies.html to determine the expires value.

Related

Google reCAPTCHA response success: false, no error codes

UPDATE: Google has recently updated their error message with an additional error code possibility: "timeout-or-duplicate".
This new error code seems to cover 99% of our previously mentioned mysterious
cases.
We are still left wondering why we get that many validation requests that are either timeouts or duplicates. Determinining this with certainty is likely to be impossible, but now I am just hoping that someone else has experienced something like it.
Disclaimer: I cross posted this to Google Groups, so apologies for spamming the ether for the ones of you who frequent both sites.
I am currently working on a page as part of a ASP.Net MVC application with a form that uses reCAPTCHA validation. The page currently has many daily users.
In my server side validation** of a reCAPTCHA response, for a while now, I have seen the case of the reCAPTCHA response having its success property set to false, but with an accompanying empty error code array.
Most of the requests pass validation, but some keep exhibiting this pattern.
So after doing some research online, I explored the two possible scenarios I could think of:
The validation has timed out and is no longer valid.
The user has already been validated using the response value, so they are rejected the second time.
After collecting data for a while, I have found that all cases of "Success: false, error codes: []" have either had the validation be rather old (ranging from 5 minutes to 10 days(!)), or it has been a case of a re-used response value, or sometimes a combination of the two.
Even after implementing client side prevention of double-clicking my submit-form button, a lot of double submits still seem to get through to the server side Google reCAPTCHA validation logic.
My data tells me that 1.6% (28) of all requests (1760) have failed with at least one of the above scenarios being true ("timeout" or "double submission").
Meanwhile, not a single request of the 1760 has failed where the error code array was not empty.
I just have a hard time imagining a practical use case where a ChallengeTimeStamp gets issued, and then after 10 days validation is attempted, server side.
My question is:
What could be the reason for a non-negligible percentage of all Google reCAPTCHA server side validation attempts to be either very old or a case of double submission?
**By "server side validation" I mean logic that looks like this:
public bool IsVerifiedUser(string captchaResponse, string endUserIp)
{
string apiUrl = ConfigurationManager.AppSettings["Google_Captcha_API"];
string secret = ConfigurationManager.AppSettings["Google_Captcha_SecretKey"];
using (var client = new HttpClient())
{
var parameters = new Dictionary<string, string>
{
{ "secret", secret },
{ "response", captchaResponse },
{ "remoteip", endUserIp },
};
var content = new FormUrlEncodedContent(parameters);
var response = client.PostAsync(apiUrl, content).Result;
var responseContent = response.Content.ReadAsStringAsync().Result;
GoogleCaptchaResponse googleCaptchaResponse = JsonConvert.DeserializeObject<GoogleCaptchaResponse>(responseContent);
if (googleCaptchaResponse.Success)
{
_dal.LogGoogleRecaptchaResponse(endUserIp, captchaResponse);
return true;
}
else
{
//Actual code ommitted
//Try to determine the cause of failure
//Look at googleCaptchaResponse.ErrorCodes array (this has been empty in all of the 28 cases of "success: false")
//Measure time between googleCaptchaResponse.ChallengeTimeStamp (which is UTC) and DateTime.UtcNow
//Check reCAPTCHAresponse against local database of previously used reCAPTCHAresponses to detect cases of double submission
return false;
}
}
}
Thank you in advance to anyone who has a clue and can perhaps shed some light on the subject.
You will get timeout-or-duplicate problem if your captcha is validated twice.
Save logs in a file in append mode and check if you are validating a Captcha twice.
Here is an example
$verifyResponse = file_get_contents('https://www.google.com/recaptcha/api/siteverify?secret='.$secret.'&response='.$_POST['g-recaptcha-response'])
file_put_contents( "logfile", $verifyResponse, FILE_APPEND );
Now read the content of logfile created above and check if captcha is verified twice
This is an interesting question, but it's going to be impossible to answer with any sort of certainly. I can give an educated guess about what's occurring.
As far as the old submissions go, that could simply be users leaving the page open in the browser and coming back later to finally submit. You can handle this scenario in a few different ways:
Set a meta refresh for the page, such that it will update itself after a defined period of time, and hopefully either get a new ReCAPTCHA validation code or at least prompt the user to verify the CAPTCHA again. However, this is less than ideal as it increases requests to your server and will blow out any work the user has done on the form. It's also very brute-force: it will simply refresh after a certain amount of time, regardless of whether the user is currently actively using the page or not.
Use a JavaScript timer to notify the user about the page timing out and then refresh. This is like #1, but with much more finesse. You can pop a warning dialog telling the user that they've left the page sitting too long and it will soon need to be refreshed, giving them time to finish up if they're actively using it. You can also check for user activity via events like onmousemove. If the user's not moving the mouse, it's very likely they aren't on the page.
Handle it server-side, by catching this scenario. I actually prefer this method the most as it's the most fluid, and honestly the easiest to achieve. When you get back success: false with no error codes, simply send the user back to the page, as if they had made a validation error in the form. Provide a message telling them that their CAPTCHA validation expired and they need to verify again. Then, all they have to do is verify and resubmit.
The double-submit issue is a perennial one that plagues all web developers. User behavior studies have shown that the vast majority occur because users have been trained to double-click icons, and as a result, think they need to double-click submit buttons as well. Some of it is impatience if something doesn't happen immediately on click. Regardless, the best thing you can do is implement JavaScript that disables the button on click, preventing a second click.

Paypal Processing - Need to grab TransactionId, CorrelationId and TimeStamp

Current Project:
ASP.NET 4.5.2
MVC 5
PayPal API
I am using this example to build myself a PayPal transaction (and yes, my code is virtually identical), as I do not know of any other method that will return the three values in the title.
My main problem is that, the example I am utilizing is much more concise and compact than the one I used for a much older Web Forms application, and as such, I am unsure as to where or even how to grab the three values I need.
My initial thought was to do so right after the ACK, and indeed I was able to obtain the CorrelationId as well as the TimeStamp, but because this was prior to the user being carted off to PayPal’s site (sandbox in this case -- see the return new PayPalRedirect contained within the if), the TransactionId was blank. And in this example, PayPal explicitly redirects the user to a Success page without returning to the Action that sent the user to PayPal in the first place, and I am not seeing any GET values in the URL at all aside from the Token and the PayerId, much less ones that could provide me with the TransactionId.
Suggestions?
I have also looked at the following examples:
For ASP.NET Core, was unsure how to adapt to my current project particularly due to appsettings.json, but it looked quite well done. I really liked how the values were rolled up in lists.
For MVC 4, but I couldn’t find where ACK was being used to determine success or successwithwarning so I couldn’t hook into that.
I have also found the PayPal content to be like trying to drink from a fire hose at full blast -- not only was the content was hopelessly outdated (Web Forms code, FTW!) but there was also so many different examples it would have taken me days to determine which one was most appropriate to use.
Any assistance would be greatly appreciated.
Edit: my initial attempt at modifying the linked code has this portion:
values = Submit(values);
var ack = values["ACK"].ToLower();
if(ack == "success" || ack == "successwithwarning") {
using(_db = new ApplicationDbContext()) {
var updateOrder = await _db.Orders.FirstOrDefaultAsync(x => x.OrderId == order.OrderId);
if(updateOrder != null) {
updateOrder.OrderProcessed = false;
updateOrder.PayPalCorrelationId = values["CORRELATIONID"];
updateOrder.PayPalTransactionId = values["TRANSACTIONID"];
updateOrder.PayPalTimeStamp = values["TIMESTAMP"];
updateOrder.IPAddress = HttpContext.Current.Request.UserHostAddress;
_db.Entry(updateOrder).State = EntityState.Modified;
await _db.SaveChangesAsync();
}
}
return new PayPalRedirect {
Token = values["TOKEN"],
Url = $"https://{PayPalSettings.CgiDomain}/cgi-bin/webscr?cmd=_express-checkout&token={values["TOKEN"]}"
};
}
Everything within and including the using() is my added content. As I mentioned, the CorrelationId and the TimeStamp come through just fine, but I have yet to successfully obtain the TransactionId.
Edit 2:
More problems -- the transactions that are “successful” through the sandbox site (the ReturnUrl is getting called) aren’t reflecting properly on my Facilitator and Buyer accounts, even when I do payments straight from the buyer’s PayPal account (not using the Credit Card). I know I am supposed to see transactions in the Buyer’s account, either through the overall Dev account (Accounts -> Profile -> balance or Accounts -> Notifications) or through the Buyer’s account in the sandbox front end. And yet -- multiple transactions returning me to the ReturnUrl path, and yet no transactions in either.
Edit 3:
Okay, this is really, really weird. I have gone over all settings with a fine-toothed comb, and intentionally introduced errors to see where things should crap out. It turns out that the entire process goes swimmingly - except nothing shows up in my notifications and no amounts get moved between my different accounts (Facilitator and Buyer). It’s like all my transactions are going into /dev/null, yet the process is successful.
Edit 4: A hint!
In the sandbox, where Buyer accepts the transaction, there is a small note, “You will be able to review the transaction before completing it” or something like that -- suggesting that an additional page is not coming up and that the user is being uncerimoniously dumped back to the success page. Why the success page? No clue. But it’s happening.
It sounds like you are only doing the first part of the process.
Express Checkout consists of 3 API calls:
SetExpressCheckout
GetExpressCheckoutDetails
DoExpressCheckoutPayment
SEC generates a token, and then you redirect to PayPal where the user signs in and reviews the transactions before agreeing to pay.
They are then sent to the ReturnURL included in your SEC request, and this is where you'll call GECD in order to obtain all the buyer details that are now available since they signed in.
Using that data you can complete the final DECP request, which is what finalizes the procedure. No money is actually processed until this final call is completed successfully.

How to tie together front and back end events in google analytics?

I am tracking user events on the front end with google analytics, but I would also like to send back end events and be able to match up events for the same user in google analytics.
It looks like I should be able to pass the uid parameter: https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#uid but it looks like I also have to pass the tid parameter https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#tid .
The docs say that "All collected data is associated by this ID" (the tid).
What should I pass for the tid? Why can't I just pass the uid, if that is supposed to be a mechanism for tying events together?
I would like the backend to pass the uid to the front end (actually a one-way hash of the email), and then refer to the user in google analytics with this uid.
Is this feasible? I'm a bit confused about how to implement this.
Many thanks!
The "tid" - Tracking ID - is the Web Property, i.e. the "slot" in your Analytics account that the data goes to. If you do not send a tracking id the calls will disappear in limbo. You find the tid in your property settings under "Tracking Code". It is a string that starts "UA-" and so is also sometimes referred to as UA-ID).
The User ID will not help you to identify users, at least not by default, since it is not exposed in the Analytics interface (it should really be called the "cross device identification id", since that is what it's for). You need to create a custom dimension and pass the value of the User ID there if you want to identify users. Per TOS you must take care that no third party, including Google, can resolve your User ID (or any other datapoint) into something that identifies a person, altough of course you can use yourself to connect data to other data in your backend system.
Actually there is a proper way. I've implemented this for myself.
There's a Client ID parameter, that should be passed with your requests.
And here's you have two options:
Create this client id manually (by generating UUID) on server-side and pass it to front-end. Then use this value when you create your tracker and also use it for server-side requests.
//creating of a tracker with manually generated client id
ga('create', 'UA-XXXXX-Y', {
'storage': 'none',
'clientId': '76c24efd-ec42-492a-92df-c62cfd4540a3'
});
Of course, you'll need to implement some logic of storing client id in cookie, for example.
You can use client id that is being generated automatically by ga and then send it to the server-side by your method of choice. I've implemented it through cookies:
// Creates a default tracker.
ga('create', 'UA-XXXXX-Y', auto);
// Gets the client ID of the default tracker and logs it.
ga(function(tracker) {
var clientId = tracker.get('clientId');
//setting the cookie with jQuery help
$.cookie("client-id", clientId , { path : "/" });
});
Then on the back-end just access this cookie and use that client id for your requests.
Also some information con be found here: What is the client ID when sending tracking data to google analytics via the measurement protocol?

How to create tracking pixel with Google Analytics for 3rd party site?

We need to track conversions that happen on a 3rd party site. The only thing we can place on that site is an image pixel and maybe some JS logic for when to fire it.
I know it is possible to fire a conversion using the Measurement Protocol: https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#visitor
Ideally, I'd just give the 3rd party an IMG url and that would be it. The problem is the CID (unique client id).
I can try passing the CID from our site to the 3rd party via URL parameter. However, there are many cases where its not available (e.g., IMG pixcel will be in an email, the goal URL is on printed literature) or the 3rd party is not willing to go through the hassle. Is it best practice to pass this CID in this way?
I can try generating a CID, but I can't find a dead simple way of doing that e.g., var CID = generateCID(). The 3rd party site has its own GA on the page. Can I just take their Google Analytics CID and use it in the image pixel URL?
What the best way to do this? Thank you!
If the 3rd-party site has analytics.js already running then using that client ID is probably best. You can get it by doing the following:
var cid;
ga(function(tracker) {
cid = tracker.get('clientId'));
});
If analytics.js is not running, or if you can't access the ga variable for some reason, you can just generate the client ID randomly. This is approximately what Google does. It's a random 31-bit integer with the current date string appended:
var cid = Math.floor(Math.random() * 0x7FFFFFFF) + "." +
Math.floor(Date.now() / 1000);
Only to complement #Philip Walton excellent answer, Google Analytics expects a random UUID (version 4) as the Client ID, according to the official Documentation.
Client ID
Required for all hit types.
This anonymously identifies a particular user, device, or browser
instance. For the web, this is generally stored as a first-party
cookie with a two-year expiration. For mobile apps, this is randomly
generated for each particular instance of an application install. The
value of this field should be a random UUID (version 4) as described
in http://www.ietf.org/rfc/rfc4122.txt
#broofa provided a simple way to generate a RFC4122-compliant UUID in JavaScript here. Quoting it here for the sake of completeness:
'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
var r = Math.random()*16|0, v = c == 'x' ? r : (r&0x3|0x8);
return v.toString(16);
});

How to remove _ga query string from URL

I have a multidomain website for which there is GA tracking. Recently we moved to Universal Analytics and noticed that whenever the domain is changed (from US to Korean/Japanese), a _ga=[random number] is appended to the URL
i.e. from
abc.com
when i click on the japanese site, the URL becomes
japanese.abc.com/?_ga=1.3892897.20937502.9237834
Why does this happen?
How can I remove the _ga part of the URL?
Appreciate your help.
This is needed for cross-domain-tracking (i.e. track people who cross domain boundaries as one visitor and not as one visitor per domain). If you want cross domain tracking you cannot remove this. The _ga - part is the client id which identifies a session and since it cannot be shared via cookies (which are domain specific) it has to be passed via the url when the domain changes.
Since somebody set your site up for cross domain tracking I guess you actually want this (it does not happen by default). The parameter is a necessary side effect of cross domain tracking with Universal Analytics. If you do want this look in the tracking code for any of the linker functions mentioned in the documentation and remove them.
Updated to answer the questions from the comment.
Is there no way to remove the _ga string and still have the cross
domain facility?
No, currently not. Browser vendors work on better ways of cross
domain communication so there might be something in the future, but
at the moment the parameter is the best way.
Also, what if some user randomly changes the _ga value and presses
enter? How will GA record that?
If the user happens to create a client id that has been used before
(highly unlikely) his visit would be attributed to another user.
Realistically Google Analytics will just record him as a new user.
Updated
For those who like to play I did a proof of concept for cross domain tracking without the _ga parameter. Something along those lines could be developed further, as-is it is not suitable for production use.
Update: David Vallejo has a Javascript solution where the _ga parameter is removed via the history API (so while it is still added it is for all intents and purposes invisible to the end user). This is a more elaborate version of Michael Hampton's answer below.
I'm using HTML5 history.replaceState() to hide the GA query string in the browser's address bar.
This requires me to construct a new URL having the _ga= value removed (you can do this in your favorite language) and then simply calling it.
This only alters the URL in the address bar (and in the browser's history). Google Analytics still gets the information passed in via the query string, so your tracking still works.
I do this in a Go html/template:
{{if .URL.RawQuery}}
<script>
window.history.replaceState({}, document.title, '{{.ReplacedURL}}');
</script>
{{end}}
I was asked to remove this tag after it started showing up when we split our website between two domain names. With Apache Rewrite Rules:
RewriteCond %{QUERY_STRING} _ga
RewriteRule ^(.*)$ $1? [R=301,NC,L]
This will remove the tag, but will not be able to pass the _ga params to Google Analytics.
If the user doesn't mind a short refresh, then adding this code to every page
<?php
list($url, $qs) = preg_split('/\?/',$_SERVER['REQUEST_URI']);
if (preg_match('/_ga=/', $qs) ) header( "refresh:1;url=${url}" );
?>
will refresh after a second, removing the query string, but allowing the Google Analytics action to take place. This means that by the time your user has bookmarked or copied your URL, the pesky _ga stuff has long gone.
The above code will throw away ANY query string. This version will just strip out the '_ga' argument.
$urlA = parse_url($_SERVER['REQUEST_URI']);
$qs = $urlA['query'];
if (preg_match('/_ga=/',$qs)) {
$url = $urlA['path'];
$newargs = array();
$QSA = preg_split('/\&/',$qs);
foreach ($QSA as $e) {
list($arg,$val) = preg_split('/\=/',$e);
if ($arg == '_ga') continue; # get rid of this one
$newargs[$arg] = $val;
}
$nqs = http_build_query($newargs);
header( "refresh:1;url=${url}?${nqs}" );
}
You can't stop Google from adding the tag, but you can tell Analytics to ignore it in your reports. Thanks to Russ Henneberry for this: http://blog.crazyegg.com/2013/03/29/remove-url-parameters-from-google-analytics-reports/
It was written before Universal was released, so the language is outdated - now you create a new "view" (rather than "profile"). Creating a new view ensures that you still have the raw data in your default view (just in case you ever need it), so it's really the best solution (keeping in mind that you can't ever apply new settings retroactively in G Ax). Good luck!
You can't remove the _ga parameter from the URL on the website...BUT you can use an Advanced filter in Google Analytics to remove the query parameter from the reports!
Like this:
1) Field A: Request URI
Pattern: ^(.+)\?_ga
2) Field B: not needed
3) Output To -> Constructor
Field: Request URI
Pattern: $A1
This filter that will strip off all query parameters when _ga is the first parameter shown. You can get a lot fancier with the regex, but this approach should work for most websites.
See this page: https://support.google.com/tagmanager/answer/6107124?hl=en
& search for "use hash as delimiter"
Setting this value to true allows you to pass the value through a hash tag instead of through a query parameter
Should fix it
One way to handle this is to use the history.replaceState Javascript function to remove the query string from the URL after the page is finished loading and Google Analytics has done its thing. However, if you remove it too soon, it'll affect GA functionality (one visitor will show as multiple visitors). I've found that the following Javascript (with a 3-second delay)
<script defer src="data:text/javascript,async function main() {await new Promise(r => setTimeout(r, 3000));window.history.replaceState({}, document.title, window.location.pathname);}main();"></script>
I used "window.location.pathname" for convenience so that you can use the same script on many pages. However, you can also do like this (for the top page of the site):
<script defer src="data:text/javascript,async function main() {await new Promise(r => setTimeout(r, 3000));window.history.replaceState({}, document.title, '/');}main();"></script>
Or for a sub-page:
<script defer src="data:text/javascript,async function main() {await new Promise(r => setTimeout(r, 3000));window.history.replaceState({}, document.title, '/something/something.html');}main();"></script>
I did the "data:text/javascript" thing instead of a true in-line script so I could apply "defer" to it, although this probably isn't necessary if you're using a sufficiently long delay value.
You can filter out all (or only include) "?_ga=" parameters in Google Analytics for reporting purposes. I would also highly recommend adding a canonical to the base URL -- or adding the parameters to Google Webmaster Tools -- to avoid duplicate content.

Resources