Google Analytics: How to overcome payload size restriction? - google-analytics

I use Google Analytics Enhanced E-commerce for some shops. On catalog page I have many products and I need to track their impressions. I do not track each product one-by-one, cause it will cause many requests, instead I add all of them through .ec:addImpression and then track entire pack by sending single pageview.
And everything was going well until I faced a problem, that on page with too many products requests to collect stopped working with no error. I've installed analytics debugger for Chrome and found out, that I've exceeded a payload limit, which is set to 8 KB (according to official documentation):
payload_data – The BODY of the post request. The body must include
exactly 1 URI encoded payload and must be no longer than 8192 bytes.
And this is fine, but here's my question: is there any way to overcome this restriction? Maybe some option or method, that will allow not to bother about payload size and it will be automatically split into proper chunks? Or at least a method to get a payload in run-time to check its size. I run through documentation and found nothing.
Note: currently I manually track a "safe" number (which was discovered by experience) of products added by addImpression and then send them by non-interaction pageview hit. Of course, this solves my problem, by I want to know if there's a built-in solution.

Another possibility is to send only true impressions, that is, only for those products/items that the user is actually seeing above the fold. Not all products that you are sending impressions for are actually seen by the user until they scroll down the page below the fold. So this would require a modification to the implementation where you send your impression data as the user scrolls down the page and reveals more products. You can likely send more information with each product and still not exceed the payload, and you get a more accurate measure of your impressions.

Create a product data import that matches your product ids to product to product attributes (name, category, price etc). Wait until the data is processed, then change your tracking code so that only product ids are sent.
That should shrink down the request body enough to send all products, and the ids will be joined with the imported data when the incoming hits are processed.
Imported data is not applied retroactively, so it'S important that you do the data import first.

AFAIK there is no way to get your payload size from google analytics and it's a crying shame that analytics.js does not handle this issue automatically since the analytics.js library which constructs the payloads is best suited to handle this and thus minimise load on Google's servers...
I like Eike's solution, though if your products change a lot it might require automation. As #nyuen implies - sending only real impressions may help and is more accurate.
Another trick is to send the impression one at a time. (As shown or on page load) This will require the smallest change and reduce the payload to well bellow the limit.

Related

Google Analytics tracking for PDA emails

I have a requirement where I need to track whether a user clicked a link in a PDA email where the link included in the email is >900 characters.
I'm not sure if Google analytics support tracking in PDA.
If anyone has ever done this,please help me out.
Thanks
I seem to have misunderstood the question, so here is an update. Google will usually track any valid Urls. The two exceptions I can think of are more theoretical than a practical concerns.
Some old browsers (I think IE6 and similar vintages) have a character limit for GET requests (2048 bytes IIRC), so very long links will not work, and this not be tracked correctly. For all practical purposes these browsers should be extinct by now
A Google Analytics request is limited to 8096 bytes.The request has to transmit the document location as part of the payload, so if your URL is really massively oversizes (technically 8000 characters is ">900") this would not be tracked. Again, this is hardly a practical concern (unless there is a lot of other data, like e.g. Enhanced E-Commerce product impressions in that request).
Old (and probably irrelevant) answer:
Google Analytics does typically not track actions within emails, since email clients do not usually support javascript (there are implementations of email open tracking via "web bugs" linked to a script that does a measurement protocol request, but event that does not work particularly well).
If this is a link that points to your homepage the typical way to track this would be via utm parameters - i.e. you do not track the action within the email itself, but the result (the visit to your homepage).
UTM parameters (or "campaign parameters") are
utm_medium - the kind of traffic (if it's paid advertising, banner ads, or in your case e.mail)
utm_source - the specific vendor (e.g. "google" if the link is from a paid Google Ad, or in your case it could be the name of the department that sent out the mail)
utm_campaign - your advertising campaign; in the case of a periodic newsletter this could be e.g. the number of the newsletter
utm_term - you usually would not use that in an email, that's reserved for when a link is a result of a search (then you would insert the search term)
utm_content - if you have multiple links with the same link target and campaign info you can add additional information (e.g. if you have the same link at the top and the bottom of your mail you could indicate the position here)
You cannot do anything dynamic, though - if you want to mark links with a specific character count you would have to do this within your newsletter programm and insert the number. GA would then be able to pick this up from the campaign parameters.
E.g. for your use case you might construct a target URL like
www.example.com?utm_medium=email&utm_source=my_department&utm_campaign=pda_mail&utm_content=<number of characters>
and then get the information from the Aquisition reports in Google Analytics.
If the links do not point to your own homepage you would need to set up an intermediate page that tracks the utm_parameters before it redirects to the intended destination.

Google Analytics data limits

We are using Google Analytics on a webshop. Recently we have added enhanced ecommerce to measure more events so we can optimize the webshop. But now we are experiencing less pageviews and other data is missing.
I don't know what it is, but on a specific page we are nog measuring anymore, I removed some items from the ga:addImpression data, and now the pageview is measured again.
I can find limits for GA, but I can't find anything for the amount of data that can be send to GA. Because is this seems to be related to the amount of data that is send to GA. If I shorten the name of a product, the pageview is also measured again. GA is practically broken now for us because we are missing huge numbers of pageviews.
Where can I find these limits, or how will I ever know when I'm running into these limits?
In one hand, im not sure how are you building your hits but maybe you should keep in mind the payload limits to send information to GA. (The limit is 8Kb)
In the other hand there is a limit in fact that you should consider (Docs)
This applies to analytics.js, Android iOS SDK, and the Measurement Protocol.
200,000 hits per user per day
500 hits per session
If you go over either of these limits, additional hits will not be processed for that session / day, respectively. These limits apply to Analytics 360 as well.
My best advise is to regulate the amount of events you send really considering which information has value. No doubt EE data is really important so you should partition productImpression hits in multiple ones of the problem is the size. (As shown in the screenshot)
And finally, migrate to GTM.
EDIT: Steps to see what the dataLayer has in it (in a given moment)
A Google Analytics request can send max about 8KB of data:
POST:
payload_data – The BODY of the post request. The body must include
exactly 1 URI encoded payload and must be no longer than 8192 bytes.
URL Endpoint
The length of the entire encoded URL must be no longer than 8000
Bytes.
If your hit exceeds that limit (happens e.g. with large product lists in EEC tracking) it is not (as far as I can tell) processed.
There are also restrictions to field length for some fields (e.g. custom dimension with max 150 bytes, others are detailed in the parameter reference ).
In some cases the data type is relevant, e.g. if in your event tracking the event value is set to a string the call might fail.
I think this is the page you are looking for Quota and limits page can help
These limits apply to the Web Property / Property / Tracking ID.
10 million hits per month per property
If you go over this limit, the Google Analytics team might contact you and ask you upgrade to Analytics 360 or implement client sampling to reduce the amount of data being sent to Google Analytics.

Google Analytics Ecommerce / Difference between 'ec:addItem', 'ec:addTransaction' and 'ec:send'

I would look for some feedback on tracking user activity on an commerce website using th google analytics commerce capabilities.
I can't fully understand those 3 parts :
Adding an item (ecommerce:addItem) : obviously when some user add a thing to the cart
Adding a Transaction (ecommerce:addTransaction) : that's where I'm very confused
Sending the data (ecommerce:send) : that's obvious
Can those 3 event append at a different moment ? in what manner ?
What would be a real-world use case that would make you use execute ecommerce:addTransaction and ecommerce:send at a different moment ?
This thing makes me wonder a lot, and I'd like to have some experienced feedback on this as you tend to easily break your stats if something is not done week enough
Thanks in advance
EDIT
So the main purpose right here is to get stats for the pending orders (you add stuff to your cart), and the complete orders (you paid for the things you added).
Right now I only send it all when the order is complete, and things are working pretty good in analytics, but I just don't know anything about the ones that did not complete.
This question was a lack of knowledge.
Simple ecommerce plugin has nothing to do with the enhanced ecommerce plugin
You won't track that much with the first one, except the checkouts. A plain, one order at a time, revenue value.
If you want a deep insight on your users behaviors (when i say deep, I mean it), You have to go for the second one.
We might be able to debate over the unusefullness of the first one; and the fact that its existence in itself compared to the second is completely misleading, as when you first get in, as usual with google, you get flooded by an endless documentation
ecommerce:addItem does not add items to a cart; it adds items to a transaction (with "conventional" ecommcerce tracking there is no cart tracking, you'd have to use enhanced ecommerce tracking. Actually your title refers to enhanced ("ec:") and your question to conventional ecommerce ("ecommerce:") tracking).
So ecommerce:addTransaction starts a transaction; here goes the stuff that affects the transaction as a whole, like transaction id, tax on the total purchase or shipping costs.
Now that you have started the transaction you can add items to it that are associated via the transaction id.
Finally the ecommerce:send command tells Universal Analytics that the transaction should be processed on the server. "send" is actuall a misnomer; addItem and addTransaction do already send data to the server (they each create an request to the tracking server and thus count towards your hit quota).
The reason for this is, as far as I can tell, that the information is transmitted via url parameters (you call the Google Analytics endpoint which returns an transparent pixel). The maximum length for an url request is limited (actual limits depend on browser and browser version).
So the transaction is broken up into multiple parts not because you want to execute the commands at different moments but so it can be transmitted via Url parameters without being truncated. The send command merely tells that you are now finished adding new parts to the transaction and the data can now be processed.

Google Maps - Caching - Methods

Ok! So I have spoken to a google representative about this issue, however since I am not enterprise level, he can't push me to tech support and suggested that I use the SO for answers. Here is the question...
In Google Maps Terms it states the following:
(b) No Pre-Fetching, Caching, or Storage of Content. You must not pre-fetch, cache, or store
any Content, except that you may store: (i) limited amounts of Content for the purpose of
improving the performance of your Maps API Implementation if you do so temporarily (and in
no event for more than 30 calendar days), securely, and in a manner that does not permit
use of the Content outside of the Service; and (ii) any content identifier or key that
the Maps APIs Documentation specifically permits you to store. For example, you must not
use the Content to create an independent database of "places" or other local listings
information.
This led me to originally believe that google would not allow caching of any type of information. However, then I read the following:
When to Use Client-Side Geocoding
The basic answer is "almost always." As geocoding limits are per user session, there is no risk that your application will reach a global limit as your userbase grows. Client-side geocoding will not face a quota limit unless you perform a batch of geocoding requests within a user session. Therefore, running client-side geocoding, you generally don't have to worry about your quota.
Two basic architectures for client-side geocoding exist.
Run the geocoding and display entirely in the browser. For instance, the user enters an address on your page. Your application geocodes it. Then your page uses the geocode to create a marker on the map. Or your app does some simple analysis using the geocode. No data is sent to your server. This reduces load on your server, but doesn't give you any sense of what your users are doing.
Run the geocode in the browser and then send it to the server. For instance, the user enters an address. Your application geocodes it in the browser. The app then sends the data to your server. The server responds with some data, such as nearby points of interest. This allows you to customize a response based on your own data, and also to cache the geocode if you want. This cache allows you to optimize even more. You can even query the server with the address, see if you have a recently cached geocode for it, and if you do, use that. If you don't, then return no result to the browser, and let it geocode the result and send it back to the server to for caching.
So one side says you cannot cache, the other side tells you, you should. Another solution it states is to always use clientside when you can, but then this becomes a grey area as well, because both examples state that you must have a user input data. What if the jquery read data from a div or span and then geocoded the information? The user wouldn't have actually done the geocode,but it was still done client-side? I'm trying to create a site that has a bunch of events generated by users and this site could get pretty loaded, so I am trying to determine the best practice in being able to do this. Google suggested here, so before you go and say this is "off-topic" please note, this is where they stated me to post.
Any feedback would be greatly appreciated.
The first quote does not explicitly forbid caching data at all. It is ambiguous as to how much you can cache (what number explicitly is "limited amounts"?) but it does not forbid caching.
You are allowed to cache the data if it helps improve the performance of your site as long as you retain the data for no longer than 30 days and do not make it available in any way to any other service except the service that originally retrieved the data.
Regarding user interaction - if your user explicitly enters a page with the expectation that they will be shown geocoded information I would assume that this would fulfill "user interaction".
As an example from a project I worked on last year I had it set up to do the following:
- Show markers on the map
- If the user clicked a marker they were shown a popup with data from the cache if available, otherwise a geocode would be performed and the returned information would be cached along with the date/time of the cache.
Another page of the site showed a history of these markers at 5 minute intervals throughout the day. If cached data was present (from clicking the map marker as in the previous part) this would be shown, otherwise a geocode would be performed and the data cached as before. The user clicking to run the report was (in my opinion) enough "user interaction" to not count as pre-fetching as the user had to manually select a timeframe before the report would be displayed.
A cronjob then ran every day at midnight which would go through each record with cached data over 25 days old and remove it.
As it was I was caching much less than 10% of the marker positions being shown (20+ markers being updated every minute, but the report was being run on maybe 3-5 markers each day and only geocoding data for every 5th point).

How to decode google gclids

Now, I realise the initial response to this is likely to be "you can't" or "use analytics", but I'll continue in the hope that someone has more insight than that.
Google adwords with "autotagging" appends a "gclid" (presumably "google click id") to link that sends you to the advertised site. It appears in the web log since it's a query parameter, and it's used by analytics to tie that visit to the ad/campaign.
What I would like to do is to extract any useful information from the gclid in order to do our own analysis on our traffic. The reasons for this are:
Stats are imperfect, but if we are collating them, we know exactly what assumptions we have made, and how they were calculated.
We can tie the data to the rest of our data and produce far more accurate stats wrt conversion rate.
We don't have to rely on javascript for conversions.
Now it is clear that the gclid is base64 encoded (or some close variant), and some parts of it vary more than others. Beyond that, I haven't been able to determine what any of it relates to.
Does anybody have any insight into how I might approach decoding this, or has anybody already related gclids back to compaigns or even accounts?
I have spoken to a couple of people at google, and despite their "don't be evil" motto, they were completely unwilling to discuss the possibility of divulging this information, even under an NDA. It seems they like the monopoly they have over our web stats.
By far the easiest solution is to manually tag your links with Google Analytics campaign tracking parameters (utm_source, utm_campaign, utm_medium, etc.) and then pull out that data.
The gclid is dependent on more than just the adwords account/campaign/etc. If you click on the same adwords ad twice, it could give you different gclids, because there's all sorts of session and cost data associated with that particular click as well.
Gclid is probably not 100% random, true, but I'd be very surprised and concerned if it were possible to extract all your Adwords data from that number. That would be a HUGE security flaw (i.e. an arbitrary user could view your Adwords data). More likely, a pseudo-random gclid is generated with every impression, and if that ad is clicked on, the gclid is logged in Adwords (otherwise it's thrown out). Analytics then uses that number to reconcile the data with Adwords after the fact. Other than that, there's no intrinsic value in the gclid number itself.
In regards to your last point, attempting to crack or reverse-engineer this information is explicitly forbidden in both the Google Analytics and Google Adwords Terms of Service, and is grounds for a permanent ban. Additionally, the TOS that you agreed to when signing up for these services says that it is not your data to use in any way you feel like. Google is providing a free service, so there are strings attached. If you don't like not having complete control over your data, then there are plenty of other solutions out there. However, you will pay a premium for that kind of control.
Google makes nearly all their money from selling ads. Adwords is their biggest money-making product. They're not going to give you confidential information about how it works. They don't know who you are, or what you're going to do with that information. It doesn't matter if you sign an NDA and they have legal recourse to sue you; if you give away that information to a competitor, your life isn't worth enough to pay back the money you will have lost them.
Sorry to break it to you, but "Don't be Evil" or not, Google is a business, not a charity. They didn't become one of the most successful companies in the world by giving away their search algorithm to the first guy who asked for it.
The gclid parameter is encoded in Protocol Buffers, and then in a variant of Base64.
See this guide to decoding the gclid and interpreting it, including an (Apache-licensed) PHP function you can use.
There are basically 3 parameters encoded inside it, one of which is a timestamp. The other 2 as yet are not known.
As far as understanding what these other parameters mean—it may be helpful to compare it to the ei parameter, which is encoded in an extremely similar way (basically Protocol Buffers with the keys stripped out). The ei parameter also has a timestamp, with what seem to be microseconds, and 2 other integers.
FYI, I just posted a quick analysis of some glcid data from my sites on this post. There definitely is some structure to the gclid, but it is difficult to decipher.
I think you can get all the goodies linked to the gclid via google's adword api. Specifically, you can query the click performance report.
https://developers.google.com/adwords/api/docs/appendix/reports#click
I've been working on this problem at our company as well. We'd like to be able to get a better sense of what our AdWords are doing but we're frustrated with limitations in Analytics.
Our current solution is to look in the Apache access logs for GET requests using the regex:
.*[?&]gclid=([^$&]*)
If that exists, then we look at the referer string to get the keyword:
.*[?&]q=([^$&]*).*
An alternative option is to change your Apache web log to start logging the __utmz cookie that google sets, which should have a piece for the keyword in utmctr. Google __utmz cookie and you should be able to find plenty of information.
How accurate is the referer string? Not 100%. Firewalls and security appliances will strip it out. But parsing it out yourself does give you more flexibility than Google Analytics. It would be a great feature to send the gclid to AdWords and get data back, but that feature does not look like it's available.
EDIT: Since I wrote this we've also created our own tags that are appended to each destination url as a request parameter. Each tag is just an md5 hash of the text, ad group, and campaign name. We grab it using regex from the access log and look it up in a SQL database.
This is a non-programmatic way to decode the GCLID parameter. Chances are you are simply trying to figure out the campaign, ad group, keyword, placement, ad that drove the click and conversion. To do this, you can upload the GCLID into AdWords as a separate conversion type and then segment by conversion type to drill down to the criteria that triggered the conversion. These steps:
In AdWords UI, go to Tools->Conversions->Add conversion with source "Import from clicks"
Visit the AdWords help topic about importing conversions https://support.google.com/adwords/answer/7014069 and create a bulk load file with your GCLID values, assigning the conversions to you new "Import from clicks" conversion type
Upload the conversions into AdWords in Tools->Conversions->Conversion actions (Uploads) on left navigation
Go to campaigns tab, Segment->Conversions->Conversion name
Find your new conversion name in the segment list, this is where the conversion came from. Continue this same process on the ad groups and keywords tab until you know the GCLID originating criteria
Well, this is no answer, but the approach is similar to how you'd tackle any cryptography problem.
Possibility 1: They're just random, in which case, you're screwed. This is analogous to a one-time pad.
Possibility 2: They "mean" something. In that case, you have to control the environment.
Get a good database of them. Find gclids for your site, and others. Record all times that all clicks occur, and any other potentially useful data
Get cracking! As you have started already, start regressing your collected data against your known, and see if you can find patterns used decrypting techniques
Start scraping random gclid's, and see where they take you.
I wouldn't hold high hope for this to be successful though, but I do wish you luck!
Looks like my rep is weak, so I'll just post another answer rather than a comment.
This is not an answer, clearly. Just voicing some thoughts.
When you enable auto tagging in Adwords, the gclid params are not added to the destination URLs. Rather they are appended to the destination URLs at run time by the Google click tracking servers. So, one of two things is happening:
The click servers are storing the gclid along with Adwords entity identifiers so that Analytics can later look them up.
The gclid has the entity identifiers encoded in some way so that Analytics can decode them.
From a performance perspective it seems unlikely that Google would implement anything like option 1. Forcing Analytics to "join" the gclid to Adwords IDs seems exceptionally inefficient at scale.
A different approach is to simply look at the referrer data which will at least provide the keyword which was searched.
Here's a thought: Is there a chance the gclid is simply a crytographic hash, a la bit.ly or some other URL shortener?
In which case the contents of the hashed text would be written to a database, and replaced with a unique id.
Afterall, the gclid is shortening a bunch of otherwise long text.
Takes this example:
www.example.com?utm_source=google&utm_medium=cpc
Is converted to this:
www.example.com?gclid=XDF
just like a URL shortener.
One would need a substitution cipher in order to reverse engineer the cryptographic hash... not as easy task: https://crypto.stackexchange.com/questions/300/reverse-engineering-a-hash
Maybe some deep digging into logs, looking for patterns, etc...
I agree with Ophir and Chris. My feeling is that it is purely a serial number / unique click ID, which only opens up its secrets when the Analytics and Adwords systems talk to each other behind the scenes.
Knowing this, I'd recommend looking at the referring URL and pulling as much as possible from this to use in your back end click tracking setup.
For example, I live in NZ, and am using Firefox. This is a search from the Firefox Google toolbar for "stack overflow":
http://www.google.co.nz/search?q=stack+overflow&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGLL_en-GB
You can see that: a) im using .NZ domain, b) my keyword "stack+overflow", c) im running firefox.
Finally, if you also stash the full landing page URL, you can store the GCLID, which will tell you the visitor came from paid, whereas if it doesn't have a GCLID, then the user must have come from natural search (if URL tagging is enabled of course).
This would theoretically allow you to then search for the keyword in your campaign, and figure out which adgroup them came from. Knowing the creative would probably be impossible though, unless you split test your landing URLs or tag them somehow.

Resources