I want to set a custom user-agent for a webview app that embeds my website. I am able to set a custom agent like this ("My App Android").
The issue is that Google Analytics reads traffic as Desktop for this agent not mobile like regular webview.
What's the best method to set a custom user-agent while still keeping data like mobile, and Device OS so tools like Google Analytics can still read it.
You can manipulate the User Agent but you can't control how Google will interpret the resulting device/OS:
The processing is done on the server side (Google) so there is no way of directly modifying that data (even when sending data via the measurement protocol).
The processing details are not disclosed by Google so you won't know what the outcome of your experiments are until they're reported by Google Analytics (which due to the 24-48 hour data processing latency might make such experimentation tedious).
Attempting to manipulate it might "break" your analytics: Google is vague about this, they just say: "Google has libraries to identify real user agents. Hand crafting your own agent could break at any time". 2 consequences I can think of: Google simply drops the traffic if it can't parse the User Agent OR marks it as bot/spider traffic (which will also be dropped if you have enabled the bot filtering option).
Although it's not mentioned in the documentation, I also suspect Google to rely on other data points, which could be:
Screen resolution
Java Support
Flash version
I couldn't find more details on the topic, and I don't think you will find more details from Google explaining what they use to calculate browser/device because they don't want people messing with it (analogy: you won't find details about which data points are used for SEO, because they don't want people messing with it). The 4 dimensions I listed (User Agent, Screen resolution, Java Support, Flash version), are to my knowledge the only 4 that are device-specific from all GA collects (others are derived from them):
https://developers.google.com/analytics/devguides/reporting/core/dimsmets#view=detail&group=platform_or_device
As in MAX's answer it's true, it's very difficult to manipulate the user-agent while keeping all the attributes, Like OS, and rendering engine etc...
At the sametime I still want to target my app users with a custom user-agent, and be able to separate traffic from this webview app.
What I did is this:
1- Setting the custom user-agent
Instead of replacing the whole user-agent with a custom one, I appended this to the user-agent [AppID/AppVersion], found great info from this blog: Webviews and User-Agent strings.
Now the user-agent looks something like this:
Mozilla/5.0 (Linux; Android 9; wv)
AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91
Mobile Safari/537.36 [Custom App/1.0.1]
Check: Correct way to format user-agent string in an Android WebView App?
2- Setting a custom dimension in Google Analytics
Since Google Analytics will mark all browser value visits from this agent as Android Webview, I went to assign a custom dimension to be able to identify the custom user-agent sessions and create a separate view for it.
In the backend with PHP I set the value of the dimension based on the user-agent.
<script>
<?php
if(strpos($_SERVER['HTTP_USER_AGENT'], 'Custom user agent here')!==false)
{
$customAgent_value = 'your agent';
}
?>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-', {
'custom_map': {'dimension1': 'custom_agent'}
});
gtag('event', 'custom_agent_event', {'custom_agent': '<?= $customAgent_value;?>'});
</script>
This is working fine for me now. I can target users from a specific webview app, and at the same time am able to separate the traffic from different webviews in Analytics.
Related
We have created an application to send out bulk emails using AWS SES. We are able to send out the emails and track the metrics like Opens, Clicks etc using AWS SNS successfully. The only problem we have is that in the "Opens" object that SNS is sending, it is always returning the same value "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)". What we are looking at is to determine where the email is opened like Mobile/Tab/Desktop and in which browser. Even when the email is opened in Chrome, it is returning as Mozilla. Any help/suggestion in this regard is highly appreciated.
Additional Info: I figured out that the userAgent is being correctly returned in "clicks" object. But not in the "Open" object. Not sure why. We would like to track the same information when the email is opened also as not all the recipients click on a link.
There isn't actually a way to determine that a message has been opened.¹ Detecting "opens" relies on detection of the viewer fetching an image embedded in the message when the mail is "opened."
At the bottom of each message, we insert a 1 pixel by 1 pixel transparent GIF image. Each email includes a unique link to this image file; when the image is opened, we can tell exactly which message was opened and by whom.
When the viewer is Gmail, the user's browser doesn't fetch this image.
https://aws.amazon.com/blogs/messaging-and-targeting/open-and-click-tracking-have-arrived/
When a message is opened in gmail, the user's browser doesn't fetch the image directly, it fetches it from the google image proxy, and the image proxy fetches it from SES and generates the tracking event. Hence, (via ggpht.com GoogleImageProxy).
This isn't something that you have control over, as the sender.
The proxy can identify itself by saying whatever it likes in the User-Agent field -- there is no reason to believe that the entire user-agent string isn't being created by the proxy. Google searching the topic seems to confirm that this is how the proxy always appears. Mozilla/5.0 is a generic user agent string, that does not mean anything more than "I am some kind of web browser, or want the server to believe that I am."
¹there isn't actually a way... well, technically, there is, but thanks to the widespread profusion of spam, this standard is almost never applied to Internet mail. As noted in RFC-8098, "The presence of a Disposition-Notification-To header field in a message is merely a request for an MDN. The recipients' user agents are always free to silently ignore such a request." This is almost always what happens... nothing.
Let's say i created a google sheet to capture user's email addresses.
On my website there is a small form and once the submit button is clicked and an ajax request to a google web app that writes data to a sheet is fired:
// Let's select and cache all the fields
var $inputs = $form.find("input, select, button, textarea");
// Serialize the data in the form
var serializedData = $form.serialize();
// Fire off the request
request = $.ajax({
url: https://script.google.com/macros/s/longURLcode/exec,
type: "post",
data: serializedData
});
In the google script you now use doPost(e) or doGet(e) to handle any incoming http request. To allow this to work as a sign up mechanism permissions for the web app have to be set to (i think !?)
Execute the app as: Me (myemail#gmail.com)
Who has access to the app: Anyone, even anonymous
Given everything on the google script site is set up properly, this works like a charm. So whats wrong?
Problem:
Anyone can either look into the source code of my webpage or use the dev tools to extract the url to the google web app after clicking submit. This url can now in theory be used to flood the sheet with countless (undefined) entries.
Questions:
1) Is there a way to limit accepted http request to certain origins? I tried to do this by accessing the http headers within doPost() but there seems to be no way to do so.
2) Is "Who has access to the app: Anyone, even anonymous" the wrong approach? I thought this is necessary since you can only choose google users here and some url (mywebsite.com) seemed to fall within the anonymous category.
3) I don't think this is possible but maybe i missed an option: Is there a way to NOT expose the google web app url to anyone? I guess not because you can monitor any requests with dev tools.
4) Is using sheets for capturing that kind of data just a terrible idea in general (partly for above reasons) and i should find another solution asap?
Our website is a vertical search engine and we refer a lot of traffic offsite to partners sites.
We recently switched our website over to serve all traffic via HTTPS. We realised this might confuse some of our partners if they were looking at referrer stats and saw a drop in traffic attributed to us. Therefore at the same time, we added the content-security-policy:referrer origin header and we can see that the referrer is correctly passed along by the browser.
Generally this is working fine but we have had complaints from users of Adobe SiteCatalyst (previously Omniture) who are no longer able to attribute traffic as being referred from us. We don't have access to SiteCatalyst to test this out. How does SiteCatalyst track referral traffic and is there a way to view all traffic split by different sources/referrers?
I don't know if this accounts for everything, since I don't have full context on both your end or your users' end, but here is some info / thoughts that might help.
By default, Adobe Analytics tracks referrer from document.referrer. This can be overridden by setting s.referrer.
In general, depending on how your site directs visitors to the other site vs. Browser security/privacy settings, document.referrer may or may not have a value. For example, Internet Explorer's default security/privacy settings is to suppress document.referrer on dynamically generated popup windows (e.g. window.open() calls).
So, and again, this is just speculation because I don't know the full context, you may need to work something out w/ your users, e.g. explicitly passing the referring url as a query param to the target page, and have your users pop s.referrer with it if it exists. Something along the lines of:
if ( !document.referrer ) {
s.referrer=s.Util.getQueryParam( 'refURL' );
}
Note: s.Util.getQueryParam is a utility function for Adobe Analytics AppMeasurement library that will return the value of the specified query param, or an empty string if it doesn't exist. If your users are still using legacy H code, they should use the s.getQueryParam plugin instead. Or use whatever homebrewed method of getting a query param from the URL, since javascript doesn't have a built-in function for it.
I have a special situation where the sites visitors can access the page from a certain domain but no others. So HTML and assets are no problem as long as they are stored on the server. Google Analytics on the other hand requires a download of analytics.js from Googles servers, which is impossible.
So I'm looking for a way to proxy this. The webserver itself has internet access and could relay the trafic. To report to Google about my page view, a single pixel GIF is downloaded from Google, described here: https://developers.google.com/analytics/resources/concepts/gaConceptsTrackingOverview
I think it would be kind of easy to get all the parameters in the GIF and use the measurement protocol to report to Google from the server - but the hard bit is to get all this info to the server. To download analytics.js and modify it to go to my own server seems to me as a hack that ain't future proof at all. To just get the current page from the user to the server is not a big deal, but we would like to get the user id, browser version and everything you get with Analytics.
How would you do it? Do you find a solution for this?
Update: Google has since released server-side GTM, which allows you to proxy requests and scripts through a custom domain. In most use cases I can imagine, this would be the much superior solution to a dyi proxy.
As pointed out in my comment the utm.gif is no longer used. Google Analytics has completely switched to the Measurement Protocol and data is now sent to the Endpoint for the Measurement Protocol at google-analytics.com/collect. Actually this still return a transparent pixel since calling an image with parameters is a probate way of transmitting informations across domain boundaries.
Now, you could just the Measurement Protocol to implement your own Google Analytics tracker.
To quote myself:
Each calls includes at least the ID of the account you want to send
data to, a client id that allows to group interactions into sessions
(so it should be unique per visitor, but it must not identify a user
personally), an interaction type (pageview, event, timing etc., some
interactions types require additional parameters) and the version of
the protocol you are using (at the moment there is only one version).
So the most basic example to record a pageview would look like this:
www.google-analytics.com/collect/v=1&tid=UA-XXXXY&cid=555&t=pageview&dp=%2Fmypage
You probably would want to add the users IP (will be anonymized automatically) and the user agent.
However it sounds like you prefer to use the standard Analytics code to collect the data and relay the tracking call via your own server. While I haven't used the following in production I don't see any reason why it wouldn't work.
First you need the analytics.js file. Self-hosting the file is discouraged, but the given reason is that the code is updated sometimes by Google and if you host it yourself you might miss the updates. This can be remedied by setting up a cron job that downloads the file regularly to your server so you always have a current version.
Next you'd adapt the GA bootstrap function to load the code from your own server:
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.myserver.com/analytics.js','ga');
Now you have the code, but the tracking call will still be sent to the Analytics Server (i.e. in your case it won't be sent at all). So you need to re-route the call via your server.
To make this possible the Google (Universal) Analytics Code has a feature called "tasks". Tasks are functions within the tracking code in which the tracking call is being assembled.
It is possible to modify tasks by using the "set" function of the tracker object, using the taskname as parameter and passing a function that overwrites/overloads the task function.
The following is pretty much the example from the Google documentation (except I omitted the part where data is still being sent to Google - you don't need this at this point):
ga('create', 'UA-XXXXX-Y', 'auto');
ga(function(tracker) {
tracker.set('sendHitTask', function(model) {
var payLoad = model.get('hitPayload');
var gifRequest = new XMLHttpRequest();
var gifPath = "/__ua.gif";
gifRequest.open('get', gifPath + '?' + payLoad, true);
gifRequest.send();
});
});
ga('send', 'pageview');
Now this sends the data to a file called __ua.gif at your own server (if you need to send data cross-domain you can simply do a var ua = new Image; ua.src = gifPath + '?' + payLoad to create an image request).
The model parameter to the sendHitTask-function contains (apart from a lot of overhead) the payload, that is the assembled query string that contains the analytics data. You can then make your _ua.gif a script that proxies the request to the google-analytics.com/collect.
At this point the user agent will be your script and the IP adress will be that of your server, so you need to include &uip (User IP override) and &ua (User agent override) parameters ( https://groups.google.com/forum/#!msg/google-analytics-measurement-protocol/8TAp7_I1uTk/KNjI5IGwT58J) to get geo and technical information.
If you are feeling more adventurous you can override the buildHitTask instead and try and add the additional parameters there (more hassle probably since you'd need to get the IP address from somewhere).
For additional parameter see the reference for analytics.js and the Measurement Protocol.
Is it possible to obtain raw logs from Google Analytic? Is there any tool that can generate the raw logs from GA?
No you can't get the raw logs, but there's nothing stopping you from getting the exact same data logged to your own web server logs. Have a look at the Urchin code and borrow that, changing the following two lines to point to your web server instead.
var _ugifpath2="http://www.google-analytics.com/__utm.gif";
if (_udl.protocol=="https:") _ugifpath2="https://ssl.google-analytics.com/__utm.gif";
You'll want to create a __utm.gif file so that they don't show up in the logs as 404s.
Obviously you'll need to parse the variables out of the hits into your web server logs. The log line in Apache looks something like this. You'll have lots of "fun" parsing out all the various stuff you want from that, but everything Google Analytics gets from the basic JavaScript tagging comes in like this.
127.0.0.1 - - [02/Oct/2008:10:17:18 +1000] "GET /__utm.gif?utmwv=1.3&utmn=172543292&utmcs=ISO-8859-1&utmsr=1280x1024&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=9.0%20%20r124&utmdt=My%20Web%20Page&utmhn=www.mydomain.com&utmhid=979599568&utmr=-&utmp=/urlgoeshere/&utmac=UA-1715941-2&utmcc=__utma%3D113887236.511203954.1220404968.1222846275.1222906638.33%3B%2B__utmz%3D113887236.1222393496.27.2.utmccn%3D(organic)%7Cutmcsr%3Dgoogle%7Cutmctr%3Dsapphire%2Btechnologies%2Bsite%253Arumble.net%7Cutmcmd%3Dorganic%3B%2B HTTP/1.0" 200 35 "http://www.mydomain.com/urlgoeshere/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19"
No. But why don't you just use your webserver's logs? The value of GA is not in the data they collect, but the aggregation/analysis. That's why it's not called Google Raw Data.
Please have a look on this article which explains a hack to get Google analytics data.
http://blogoscoped.com/archive/2008-01-17-n73.html
Also If you can wait for sometime then official Google analytics blog says that they are working on data export api but currently it is in Private Beta.
http://analytics.blogspot.com/2008/10/more-enterprise-class-features-added-to.html
Not exactly the same as raw vs aggregated, but it seems that "unsampled" data is only available to Premium accounts:
"Unsampled Reports are only available in Premium accounts using the latest version of Google Analytics."
http://support.google.com/analytics/bin/answer.py?hl=en&answer=2601061
You can get the Analytics data, but it'll take a bit of hacking.
In any analytics report, click the 'email' button at the top of the screen. Set up the email to go to your address (or a new address on your server) and change the format to csv or xml.
Then, you can use php (or another language) to check the email account, parse the email and import the attachment to your system.
There's an article entitled 'Incoming mail and PHP' on evolt.org: http://evolt.org/incoming_mail_and_php
No, but there are other paid services like Mixpanel and KISSmetrics that have data export APIs. Much easier than trying to build your own analytics service, but costs money.