Is it possible to report uid from Google Analytics? - google-analytics

I am collecting data using Google's Measurement Protocol, and sending the uid parameter. This works fine. But I would like to be able to fetch the uid back from Google. I can fetch the value of a custom dimension, but I would prefer not to set cd1 = uid for every request if possible.
I notice in this question that there is a comment by #DaImTo saying that it is not possible... and I suspect he's right, but I'd like to hear from someone who knows for sure.

I don't believe it is possible to get the uid without setting it through a custom dimension.

Related

What should the client_id be when sending events to Google Analytics 4 using the Measurement Protocol?

I am using Google Analytics 4 (GA4) on the client to track a whole bunch of different events. However, there are 2 scenarios that I can't cover client side:
A user completing check out on a payment page hosted by a third-party (Stripe in this case).
A refund that is made by the support team.
These events are handled by the server using webhooks. To me it seems like the most straightforward solution, would be to let the server send the event to GA4 (as opposed to the client sending it). I believe the Measurement Protocol should be used for this.
For each event submitted through the Measurement Protocol a client_id is required. When the client is submitting an event, this is an automatically generated ID which is used to track a particular device.
My question thus is, what should the client_id be when submitting an event server-side?
Should the same client_id perhaps be used for all events, as to recognize the server as one device? I have read some people proposing to use a randomly generated client_id for each event, but this would result in a new user to be recognized for every server-side event...
EDIT:
One of the answers proposes to use the client_id, which is part of the request as a cookie. However, for both examples given above, this cookie is not present as the request is made by a third-party webhook and not by the user.
I could of course store the client_id in the DB, but the refund in the second example is given by the support team. And thus conceptually it feels odd to associate that event with the user's client_id as the client_id is just a way to recognize the user's device? I.e. it is not the user's device which triggered the refund event here.
Another refund event example would be when user A makes a purchase with user B and user B refunds this purchase a week later. In this situation, should the client_id be the one of user A or of user B? Again, it feels odd to use a stored client_id here. Because, what if user A is logged in on two devices? Which client_id should be used here then?
Great question. Yes, your aim to use Measurement Protocol is a proper solution here.
Do not hardcode the client id. It's gonna be a hellish mess in reports. The nature of user-based reporting (which GA is) demands client ids to uniquely identify users. To your best ability.
GA stores the client id in a cookie. You should have convenient and immediate access to it on every client hit to BE. The cookie name is _ga. GA4 appends the measurement id to the cookie name. Here, google's docs on it: https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage But you can easily find it if you inspect "collect" hits and look at their payloads. There's another cookie named _gid that contains a different value. That would be a unique client id. Set it too if you can, but don't use it for the normal client id. It has a different purpose. Here how the cookie looks here, on stack:
And here it is in Network. You will need it for proper debugging. Mostly to make sure your FE client ids are the same as BE client ids:
Keep an eye on the cases when the cookie is not set. When a cookie is not set, that most frequently means the user is using an ad-blocker. Your analysts will still want to know that the transaction happened even if there's a lack of context about the user. You still can track them properly.
3.1 The laziest solution would be giving them an "AnonymousUser" client id and then append a random number to that so that it would
both indicate that a user is anonymous and still make it possible
for GA to separate them.
3.2 A better solution would be for you to make a fingerprint client id for such users, say, hashing a concatenated string of their
useragent+ip+locale+screen resolution, this is up to your analysts
to actually work on the definition of a unique user if the google
analytics library is unable to do it.
3.3 Finally, one of the best solutions for you would be generating a client id on your own, keeping GA's format and maybe adding an indicator there that it has been generated on your end just for easier debugging in the Future and setting it as a cookie, using it instead of _ga. Just use a different cookie name so that ad-blockers wouldn't know to block it.
If you want to indicate that a hit was sent through the server, that's a good idea. Use custom dimension for that. Just sync it with your analysts first. Maybe they wouldn't want that, or maybe they would want it in a different dimension.
Now, this is very trivial. There are ways to go much deeper and to improve the quality of data from here. Like gluing the order id, the transaction id, the user id to that, using them to generate client id, do some custom client tracking for the future. But I must say that it's better than what more than 90% of, say, shopify clients have.
Also, GA4 is not good enough for deeper production usage. Many things there are still very rudimentary and lacking. I would suggest concentrating on Universal Analytics and having GA4 as a backup for when Google makes GA4 actually good enough to replace UA. That is, unless you're downloading your data elsewhere and not using GA's interface for analysis.
It seems that this page (Relevant portion in the screenshot below), advices to either send the data along with the client_id or user_id. However fails to address the fact client_id is a mandatory field as stated here.
I believe it is probably safe to assume that randomly generating this field should work. At least it seems to on my end however be warned that I am unsure if this has any impact on attribution.
* In the above image, Device ID refers to client_id

Determining search query of guest user

In my app, there's a requirement to track search queries, which individuals to the app. The point here is to track specifically individuals/search_queries, i.e. I want to be able to say:
User 10.20.30.40 (IP is derived from client, ok) came to my website
from Google's search results page of 'hello world' query
Is that even possible?
I investigated the problem a bit. It turned out, that when the user on Google Search result page clicks a link and gets to the underlying page, the referer doesn't contain the search query. Say, for example, I entered a "Hello world" query...
I open google and enter a "Hello world" query;
I see a serach result page with a link to wikipedia on top; I follow it
I enter "document.referrer" in consonle to see:
http://www.google.com.ua/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CGkQFjAA&url=http%3A%2F%2Fru.wikipedia.org%2Fwiki%2FHello%2C_world!&ei=HZ0fUIXTIordtAau54GwAQ&usg=AFQjCNHSAHv8NwVNdaxMvh0OhIxs6Zb1rQ&sig2=a8tEGTBNcFWOPykloXj_Xg
There's a &q= param with a blank value, preventing me to figure out the query the user entered.
I believe Google internally uses some of the query params (i.e. ved, ei, usg or sig2) from the URL above to keep track of query used to get to the site, and thus gatehr the statistics about which queries are used mostly to the get to the website. I couldn't find any information regarding how to use them.
In the meantime, Google Analytics would only gather a general statistics, not for individuals. Yet I noticed that there's an ability to use Custom Variables. Is it possible to use them somehow in order to track query the user came with? If so, wouldn't it be a violation of Google Analytics Terms Of Service?
So again: is there a way to know a search query one came to the site with?
This is not possible. Google erases the value of q parameter, so instead of
`?q=search+query`
one always observs.
`?q=`
Though, Google provides another way (the so called convinient one) to know which queries were used to come to your website: Google Analytics.
Also, it's impossible to track user_ip/search_query pair due to Google Analytics ToS.

Google Conversions - Tracking conversion sales with cookies

We are currently using Google Conversion tracking as well as Google Analytics, but would like to implement a way to track actual sales from known conversions.
My original idea was to check for a specific cookie when the user finalizes their payment, and if it's found, save the OrderID and ConversionCode to a database. This idea went up in flames, of course, when I found out that the web server cannot actually access the cookies for conversions.
My next idea is to save my own cookie whenever the google conversion cookie is written. The cookie would contain a unique ID, which would make my first idea usable again.
Does anyone see any significant issues with this idea? Is there a more effective way of doing this that I don't know about? I'm still very new to Google tools (and cookies in general), so I could be missing something.
You can set the order value via the google_conversion_value var in the tracking code. This obviously doesn't help with OrderID tracking, but should help you with your attribution model.
Further details here: http://adwords.google.com/support/aw/bin/answer.py?hl=en&answer=86277

How do I sort feeds returned from Google Reader?

When I query Google Reader for the list of subscriptions for a user, it seems to be returned in a fixed order, no matter what the order is as shown at google.com/reader.
(see http://www.google.com/reader/api/0/subscription/list for the list I'm talking about)
Each subscription returns a 'sortid', which Google Reader uses when the user rearranges subscriptions, by sending back a concatenation of all sortids in the new order after rearranging is finished. However, that sortid never changes.
So my question is this: How do I actually get the order the subscriptions are supposed to be in? I've been using http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI as a reference, but it is lacking in this department, and I haven't found anything anywhere else either.
I've found the answer:
http://www.google.com/reader/api/0/preference/stream/list

How to decode google gclids

Now, I realise the initial response to this is likely to be "you can't" or "use analytics", but I'll continue in the hope that someone has more insight than that.
Google adwords with "autotagging" appends a "gclid" (presumably "google click id") to link that sends you to the advertised site. It appears in the web log since it's a query parameter, and it's used by analytics to tie that visit to the ad/campaign.
What I would like to do is to extract any useful information from the gclid in order to do our own analysis on our traffic. The reasons for this are:
Stats are imperfect, but if we are collating them, we know exactly what assumptions we have made, and how they were calculated.
We can tie the data to the rest of our data and produce far more accurate stats wrt conversion rate.
We don't have to rely on javascript for conversions.
Now it is clear that the gclid is base64 encoded (or some close variant), and some parts of it vary more than others. Beyond that, I haven't been able to determine what any of it relates to.
Does anybody have any insight into how I might approach decoding this, or has anybody already related gclids back to compaigns or even accounts?
I have spoken to a couple of people at google, and despite their "don't be evil" motto, they were completely unwilling to discuss the possibility of divulging this information, even under an NDA. It seems they like the monopoly they have over our web stats.
By far the easiest solution is to manually tag your links with Google Analytics campaign tracking parameters (utm_source, utm_campaign, utm_medium, etc.) and then pull out that data.
The gclid is dependent on more than just the adwords account/campaign/etc. If you click on the same adwords ad twice, it could give you different gclids, because there's all sorts of session and cost data associated with that particular click as well.
Gclid is probably not 100% random, true, but I'd be very surprised and concerned if it were possible to extract all your Adwords data from that number. That would be a HUGE security flaw (i.e. an arbitrary user could view your Adwords data). More likely, a pseudo-random gclid is generated with every impression, and if that ad is clicked on, the gclid is logged in Adwords (otherwise it's thrown out). Analytics then uses that number to reconcile the data with Adwords after the fact. Other than that, there's no intrinsic value in the gclid number itself.
In regards to your last point, attempting to crack or reverse-engineer this information is explicitly forbidden in both the Google Analytics and Google Adwords Terms of Service, and is grounds for a permanent ban. Additionally, the TOS that you agreed to when signing up for these services says that it is not your data to use in any way you feel like. Google is providing a free service, so there are strings attached. If you don't like not having complete control over your data, then there are plenty of other solutions out there. However, you will pay a premium for that kind of control.
Google makes nearly all their money from selling ads. Adwords is their biggest money-making product. They're not going to give you confidential information about how it works. They don't know who you are, or what you're going to do with that information. It doesn't matter if you sign an NDA and they have legal recourse to sue you; if you give away that information to a competitor, your life isn't worth enough to pay back the money you will have lost them.
Sorry to break it to you, but "Don't be Evil" or not, Google is a business, not a charity. They didn't become one of the most successful companies in the world by giving away their search algorithm to the first guy who asked for it.
The gclid parameter is encoded in Protocol Buffers, and then in a variant of Base64.
See this guide to decoding the gclid and interpreting it, including an (Apache-licensed) PHP function you can use.
There are basically 3 parameters encoded inside it, one of which is a timestamp. The other 2 as yet are not known.
As far as understanding what these other parameters mean—it may be helpful to compare it to the ei parameter, which is encoded in an extremely similar way (basically Protocol Buffers with the keys stripped out). The ei parameter also has a timestamp, with what seem to be microseconds, and 2 other integers.
FYI, I just posted a quick analysis of some glcid data from my sites on this post. There definitely is some structure to the gclid, but it is difficult to decipher.
I think you can get all the goodies linked to the gclid via google's adword api. Specifically, you can query the click performance report.
https://developers.google.com/adwords/api/docs/appendix/reports#click
I've been working on this problem at our company as well. We'd like to be able to get a better sense of what our AdWords are doing but we're frustrated with limitations in Analytics.
Our current solution is to look in the Apache access logs for GET requests using the regex:
.*[?&]gclid=([^$&]*)
If that exists, then we look at the referer string to get the keyword:
.*[?&]q=([^$&]*).*
An alternative option is to change your Apache web log to start logging the __utmz cookie that google sets, which should have a piece for the keyword in utmctr. Google __utmz cookie and you should be able to find plenty of information.
How accurate is the referer string? Not 100%. Firewalls and security appliances will strip it out. But parsing it out yourself does give you more flexibility than Google Analytics. It would be a great feature to send the gclid to AdWords and get data back, but that feature does not look like it's available.
EDIT: Since I wrote this we've also created our own tags that are appended to each destination url as a request parameter. Each tag is just an md5 hash of the text, ad group, and campaign name. We grab it using regex from the access log and look it up in a SQL database.
This is a non-programmatic way to decode the GCLID parameter. Chances are you are simply trying to figure out the campaign, ad group, keyword, placement, ad that drove the click and conversion. To do this, you can upload the GCLID into AdWords as a separate conversion type and then segment by conversion type to drill down to the criteria that triggered the conversion. These steps:
In AdWords UI, go to Tools->Conversions->Add conversion with source "Import from clicks"
Visit the AdWords help topic about importing conversions https://support.google.com/adwords/answer/7014069 and create a bulk load file with your GCLID values, assigning the conversions to you new "Import from clicks" conversion type
Upload the conversions into AdWords in Tools->Conversions->Conversion actions (Uploads) on left navigation
Go to campaigns tab, Segment->Conversions->Conversion name
Find your new conversion name in the segment list, this is where the conversion came from. Continue this same process on the ad groups and keywords tab until you know the GCLID originating criteria
Well, this is no answer, but the approach is similar to how you'd tackle any cryptography problem.
Possibility 1: They're just random, in which case, you're screwed. This is analogous to a one-time pad.
Possibility 2: They "mean" something. In that case, you have to control the environment.
Get a good database of them. Find gclids for your site, and others. Record all times that all clicks occur, and any other potentially useful data
Get cracking! As you have started already, start regressing your collected data against your known, and see if you can find patterns used decrypting techniques
Start scraping random gclid's, and see where they take you.
I wouldn't hold high hope for this to be successful though, but I do wish you luck!
Looks like my rep is weak, so I'll just post another answer rather than a comment.
This is not an answer, clearly. Just voicing some thoughts.
When you enable auto tagging in Adwords, the gclid params are not added to the destination URLs. Rather they are appended to the destination URLs at run time by the Google click tracking servers. So, one of two things is happening:
The click servers are storing the gclid along with Adwords entity identifiers so that Analytics can later look them up.
The gclid has the entity identifiers encoded in some way so that Analytics can decode them.
From a performance perspective it seems unlikely that Google would implement anything like option 1. Forcing Analytics to "join" the gclid to Adwords IDs seems exceptionally inefficient at scale.
A different approach is to simply look at the referrer data which will at least provide the keyword which was searched.
Here's a thought: Is there a chance the gclid is simply a crytographic hash, a la bit.ly or some other URL shortener?
In which case the contents of the hashed text would be written to a database, and replaced with a unique id.
Afterall, the gclid is shortening a bunch of otherwise long text.
Takes this example:
www.example.com?utm_source=google&utm_medium=cpc
Is converted to this:
www.example.com?gclid=XDF
just like a URL shortener.
One would need a substitution cipher in order to reverse engineer the cryptographic hash... not as easy task: https://crypto.stackexchange.com/questions/300/reverse-engineering-a-hash
Maybe some deep digging into logs, looking for patterns, etc...
I agree with Ophir and Chris. My feeling is that it is purely a serial number / unique click ID, which only opens up its secrets when the Analytics and Adwords systems talk to each other behind the scenes.
Knowing this, I'd recommend looking at the referring URL and pulling as much as possible from this to use in your back end click tracking setup.
For example, I live in NZ, and am using Firefox. This is a search from the Firefox Google toolbar for "stack overflow":
http://www.google.co.nz/search?q=stack+overflow&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGLL_en-GB
You can see that: a) im using .NZ domain, b) my keyword "stack+overflow", c) im running firefox.
Finally, if you also stash the full landing page URL, you can store the GCLID, which will tell you the visitor came from paid, whereas if it doesn't have a GCLID, then the user must have come from natural search (if URL tagging is enabled of course).
This would theoretically allow you to then search for the keyword in your campaign, and figure out which adgroup them came from. Knowing the creative would probably be impossible though, unless you split test your landing URLs or tag them somehow.

Resources