How to retrieve exact user-agent string from Google Analytics - google-analytics

We have an interesting 'bug' in our JS code that only fires when a user agent has a specific combination of parameters -- specifically when IE8 sends both Trident/4.0 and MSIE 6.0;
We have checked the GA export data; it appears to export only the pre-digested browser information:
http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDimensionsMetrics.html#browser.
Forum post 87919 on forums.digitalpoint.com (link removed since I'm a n00b)
refers to pulling a cross-segment report for more detail; however, that no longer appears to be on the GA front end interface.
Why do we need this instead of just fixing it? If it turns out it only impacts a few users, we can schedule the fix later in the cycle; if it's impacting 20% of our base, it becomes far sooner to fix.
So, the question - how can we pull a specific User Agent string from GA; pull all UA strings from GA or run a Regex against GA to get a count of a matching UA string?
We're also working with the SA team to enable UA logging on the apache level (very high volume website; logging is turned way down).

The best you're going to get out of the Data Export API (or the Google Analytics interface) is processed information: browser / browser version (Internet Explorer / 8.0). There is no way to get the original user-agent string.
Also note that if you have a high volume website you're going to run into data sampling especially if you're looking at a date range beyond larger than a day or two (more on sampling).

Related

Linking to product page gives "Not Available for bots to index" error on screen instead of the old style App Details

https://apps.microsoft.com/store/detail/name/ID
Linking to my app product page gives "Not Available for bots to index"
What is the proper syntax to place a href link to the product so that users online can view it as html?
I don't have a final solution, just a workaround.
We've been seeing the same issue with our own app for the past few days and we've started seeing reports of people encountering this with all kinds of apps, including even Microsoft's own AppInstaller. It seems to be a server-side rate-limiting/caching issue at Microsoft as changing anything in the URL fixes the problem temporarily -- only to get back to the same issue a few hours later. We also found that VPN-ing to some locations (mostly to the US) helped as well, and this throttling seems to be specific to an app & an availability region (when it got blocked, it blocked across all Europe, but not in US for example). It seems to come & go as we have not seen a big drop in new install counts.
As a temporary solution we ended up adding a random string to the end of the URL. We used GCLID as we found it to be the least offensive way -- it could just as well be a legitimate tracking ID we pass over. So now the URL we link to looks like this:
https://apps.microsoft.com/store/detail/[APPNAME]/[APPID]?gclid=f-3579a28842a7bcac7ba630d698829e9b
Where the "f-xxxx" value is generated using the md5() of the timestamp -- but it could be any ever-changing value, even a random number.
We've reached out to our MS contact about this but we haven't heard back yet.
I encountered the same issue half an hour ago, but it seems recovered now.
If you own an app, you can confirm Product identity in Partner Center:
You can share the direct link and Store ID to help customers find your app in the Store:
URL: https://www.microsoft.com/store/apps/*12CharsAppId*
Store ID: 12CharsAppId
Store protocol link: ms-windows-store://pdp/?productid=12CharsAppId
The above address would be recommended but the apps.microsoft.com URL should also work.
I work on apps.microsoft.com. I can confirm that there was an issue on our end that appeared on August 23. This issue has been resolved.
And the proper link, of course, is this:
https://apps.microsoft.com/store/detail/[your product id here]

Google Analytics tracking for PDA emails

I have a requirement where I need to track whether a user clicked a link in a PDA email where the link included in the email is >900 characters.
I'm not sure if Google analytics support tracking in PDA.
If anyone has ever done this,please help me out.
Thanks
I seem to have misunderstood the question, so here is an update. Google will usually track any valid Urls. The two exceptions I can think of are more theoretical than a practical concerns.
Some old browsers (I think IE6 and similar vintages) have a character limit for GET requests (2048 bytes IIRC), so very long links will not work, and this not be tracked correctly. For all practical purposes these browsers should be extinct by now
A Google Analytics request is limited to 8096 bytes.The request has to transmit the document location as part of the payload, so if your URL is really massively oversizes (technically 8000 characters is ">900") this would not be tracked. Again, this is hardly a practical concern (unless there is a lot of other data, like e.g. Enhanced E-Commerce product impressions in that request).
Old (and probably irrelevant) answer:
Google Analytics does typically not track actions within emails, since email clients do not usually support javascript (there are implementations of email open tracking via "web bugs" linked to a script that does a measurement protocol request, but event that does not work particularly well).
If this is a link that points to your homepage the typical way to track this would be via utm parameters - i.e. you do not track the action within the email itself, but the result (the visit to your homepage).
UTM parameters (or "campaign parameters") are
utm_medium - the kind of traffic (if it's paid advertising, banner ads, or in your case e.mail)
utm_source - the specific vendor (e.g. "google" if the link is from a paid Google Ad, or in your case it could be the name of the department that sent out the mail)
utm_campaign - your advertising campaign; in the case of a periodic newsletter this could be e.g. the number of the newsletter
utm_term - you usually would not use that in an email, that's reserved for when a link is a result of a search (then you would insert the search term)
utm_content - if you have multiple links with the same link target and campaign info you can add additional information (e.g. if you have the same link at the top and the bottom of your mail you could indicate the position here)
You cannot do anything dynamic, though - if you want to mark links with a specific character count you would have to do this within your newsletter programm and insert the number. GA would then be able to pick this up from the campaign parameters.
E.g. for your use case you might construct a target URL like
www.example.com?utm_medium=email&utm_source=my_department&utm_campaign=pda_mail&utm_content=<number of characters>
and then get the information from the Aquisition reports in Google Analytics.
If the links do not point to your own homepage you would need to set up an intermediate page that tracks the utm_parameters before it redirects to the intended destination.

Google Analytics Ecommerce / Difference between 'ec:addItem', 'ec:addTransaction' and 'ec:send'

I would look for some feedback on tracking user activity on an commerce website using th google analytics commerce capabilities.
I can't fully understand those 3 parts :
Adding an item (ecommerce:addItem) : obviously when some user add a thing to the cart
Adding a Transaction (ecommerce:addTransaction) : that's where I'm very confused
Sending the data (ecommerce:send) : that's obvious
Can those 3 event append at a different moment ? in what manner ?
What would be a real-world use case that would make you use execute ecommerce:addTransaction and ecommerce:send at a different moment ?
This thing makes me wonder a lot, and I'd like to have some experienced feedback on this as you tend to easily break your stats if something is not done week enough
Thanks in advance
EDIT
So the main purpose right here is to get stats for the pending orders (you add stuff to your cart), and the complete orders (you paid for the things you added).
Right now I only send it all when the order is complete, and things are working pretty good in analytics, but I just don't know anything about the ones that did not complete.
This question was a lack of knowledge.
Simple ecommerce plugin has nothing to do with the enhanced ecommerce plugin
You won't track that much with the first one, except the checkouts. A plain, one order at a time, revenue value.
If you want a deep insight on your users behaviors (when i say deep, I mean it), You have to go for the second one.
We might be able to debate over the unusefullness of the first one; and the fact that its existence in itself compared to the second is completely misleading, as when you first get in, as usual with google, you get flooded by an endless documentation
ecommerce:addItem does not add items to a cart; it adds items to a transaction (with "conventional" ecommcerce tracking there is no cart tracking, you'd have to use enhanced ecommerce tracking. Actually your title refers to enhanced ("ec:") and your question to conventional ecommerce ("ecommerce:") tracking).
So ecommerce:addTransaction starts a transaction; here goes the stuff that affects the transaction as a whole, like transaction id, tax on the total purchase or shipping costs.
Now that you have started the transaction you can add items to it that are associated via the transaction id.
Finally the ecommerce:send command tells Universal Analytics that the transaction should be processed on the server. "send" is actuall a misnomer; addItem and addTransaction do already send data to the server (they each create an request to the tracking server and thus count towards your hit quota).
The reason for this is, as far as I can tell, that the information is transmitted via url parameters (you call the Google Analytics endpoint which returns an transparent pixel). The maximum length for an url request is limited (actual limits depend on browser and browser version).
So the transaction is broken up into multiple parts not because you want to execute the commands at different moments but so it can be transmitted via Url parameters without being truncated. The send command merely tells that you are now finished adding new parts to the transaction and the data can now be processed.

Google Maps - Caching - Methods

Ok! So I have spoken to a google representative about this issue, however since I am not enterprise level, he can't push me to tech support and suggested that I use the SO for answers. Here is the question...
In Google Maps Terms it states the following:
(b) No Pre-Fetching, Caching, or Storage of Content. You must not pre-fetch, cache, or store
any Content, except that you may store: (i) limited amounts of Content for the purpose of
improving the performance of your Maps API Implementation if you do so temporarily (and in
no event for more than 30 calendar days), securely, and in a manner that does not permit
use of the Content outside of the Service; and (ii) any content identifier or key that
the Maps APIs Documentation specifically permits you to store. For example, you must not
use the Content to create an independent database of "places" or other local listings
information.
This led me to originally believe that google would not allow caching of any type of information. However, then I read the following:
When to Use Client-Side Geocoding
The basic answer is "almost always." As geocoding limits are per user session, there is no risk that your application will reach a global limit as your userbase grows. Client-side geocoding will not face a quota limit unless you perform a batch of geocoding requests within a user session. Therefore, running client-side geocoding, you generally don't have to worry about your quota.
Two basic architectures for client-side geocoding exist.
Run the geocoding and display entirely in the browser. For instance, the user enters an address on your page. Your application geocodes it. Then your page uses the geocode to create a marker on the map. Or your app does some simple analysis using the geocode. No data is sent to your server. This reduces load on your server, but doesn't give you any sense of what your users are doing.
Run the geocode in the browser and then send it to the server. For instance, the user enters an address. Your application geocodes it in the browser. The app then sends the data to your server. The server responds with some data, such as nearby points of interest. This allows you to customize a response based on your own data, and also to cache the geocode if you want. This cache allows you to optimize even more. You can even query the server with the address, see if you have a recently cached geocode for it, and if you do, use that. If you don't, then return no result to the browser, and let it geocode the result and send it back to the server to for caching.
So one side says you cannot cache, the other side tells you, you should. Another solution it states is to always use clientside when you can, but then this becomes a grey area as well, because both examples state that you must have a user input data. What if the jquery read data from a div or span and then geocoded the information? The user wouldn't have actually done the geocode,but it was still done client-side? I'm trying to create a site that has a bunch of events generated by users and this site could get pretty loaded, so I am trying to determine the best practice in being able to do this. Google suggested here, so before you go and say this is "off-topic" please note, this is where they stated me to post.
Any feedback would be greatly appreciated.
The first quote does not explicitly forbid caching data at all. It is ambiguous as to how much you can cache (what number explicitly is "limited amounts"?) but it does not forbid caching.
You are allowed to cache the data if it helps improve the performance of your site as long as you retain the data for no longer than 30 days and do not make it available in any way to any other service except the service that originally retrieved the data.
Regarding user interaction - if your user explicitly enters a page with the expectation that they will be shown geocoded information I would assume that this would fulfill "user interaction".
As an example from a project I worked on last year I had it set up to do the following:
- Show markers on the map
- If the user clicked a marker they were shown a popup with data from the cache if available, otherwise a geocode would be performed and the returned information would be cached along with the date/time of the cache.
Another page of the site showed a history of these markers at 5 minute intervals throughout the day. If cached data was present (from clicking the map marker as in the previous part) this would be shown, otherwise a geocode would be performed and the data cached as before. The user clicking to run the report was (in my opinion) enough "user interaction" to not count as pre-fetching as the user had to manually select a timeframe before the report would be displayed.
A cronjob then ran every day at midnight which would go through each record with cached data over 25 days old and remove it.
As it was I was caching much less than 10% of the marker positions being shown (20+ markers being updated every minute, but the report was being run on maybe 3-5 markers each day and only geocoding data for every 5th point).

Basic site analytics doesn't tally with Google data

After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
Page Views - Grouped by subsite_id and date
Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
Is it worth me collating a list of
all major crawlers to discount, is
any list likely to change regularly?
Are there any other obvious filters
that Google will be applying to GA
data?
What other data would you
collect that might be of use further
down the line?
What variables does
Google use to work out entrance
search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.
Lots of people block Google Analytics for privacy reasons.
Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.
Here's how i've tried to reconcile the disparity when i've come across these studies:
Data Sources recorded in server-side collection but not client-side:
hits from
mobile devices that don't support javascript (this is probably a
significant source of disparity
between the two collection
techniques--e.g., Jan 07 comScore
study showed that 19% of UK
Internet Users access the Internet
from a mobile device)
hits from spiders, bots (which you
mentioned already)
Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:
hits from users behind firewalls,
particularly corporate
firewalls--firewalls block page tag,
plus some are configured to
reject/delete cookies.
hits from users who have disabled
javascript in their browsers--five
percent, according to the W3C
Data
hits from users who exit the page
before it loads. Again, this is a
larger source of disparity than you
might think. The most
frequently-cited study to
support this was conducted by Stone
Temple Consulting, which showed that
the difference in unique visitor
traffic between two identical sites
configured with the same web
analytics system, but which differed
only in that the js tracking code was
placed at the bottom of the pages
in one site, and at the top of
the pages in the other--was 4.3%
FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:
monitor requests for our
robots.txt file: then of course filter all other requests from same
IP address + user agent (not all
spiders will request robots.txt of
course, but with miniscule error,
any request for this resource is
probably a bot.
compare user agent and ip addresses
against published lists: iab.net and
user-agents.org publish the two
lists that seem to be the most
widely used for this purpose
pattern analysis: nothing sophisticated here;
we look at (i) page views as a
function of time (i.e., clicking a
lot of links with 200 msec on each
page is probative); (ii) the path by
which the 'user' traverses out Site,
is it systematic and complete or
nearly so (like following a
back-tracking algorithm); and (iii)
precisely-timed visits (e.g., 3 am
each day).
Biggest reasons are users have to have JavaScript enabled and load the entire page as the code is often in the footer. Awstars, other serverside solutions like yours will get everything. Plus, analytics does a real good job identifying bots and scrapers.

Resources