Google Analytics Realtime Sandbox Environment - google-analytics

I am looking for a way to setup a google analytics sandbox environment that will allow me
to test out my custom js code near real time.
My app will be using custom variables for advanced segmentation, and I would like to test out multiple scenarios quickly, as opposed to setting up a dummy GA account and wait for a whole day to confirm the test.
Thanks

Great question.
For GA, server updates occur every four hours, and after every sixth such update, the entire set is recalculated, which means a 24-hour lag from code change to reliable feedback. This delay also applies to most customizations to the GA Browser (e.g., "custom filters").
So if you are going to use GA as your web metrics system, and you expect to actually rely on those data then a test rig is essential.
For me, it's useful to group test systems for client-side analytics using two rubrics: (i) complete, self-contained (closed-loop) systems; or (ii) simpler automated data pulls from the production system (by "production system" here i mean GA's system, not the Site whose pages the GA code is tracking).
For the latter, just add this line to each page of your Site that contains the GA tracking code, just below '__trackPageview()':
pageTracker._setLocalRemoteServerMode();
That line will cause a copy of each transaction line to be logged to your server's activity log--so in essence, you get the data captured by GA in real-time That's all you need to do to capture the data; to parse it, you can use, for instance, any of the excellent open source web log analyzers like AWStats, or roll your own.
This is simple and reliable--but all it can do is tell you (in real-time) "does the analytics code i just implemented on pages served by my production server actually work?"
Usually, that's not good enough--you would rather know if your code will work before it's on your production server. To do that, you need to simulate the production environment and find a way to access in real-time the data GA collects.
This kind of test rig is a little more involved, but still not difficult.
In sum, it requires these steps:
host/serve the ga.js and the
tracking pixel locally;
log the __utm.gif requests (in the
GA data flow, each request
corresponds to one logged
transaction); and
parse the headers into some
convenient human-readable form.
If you want more detail than that (ie, a step-by-step implementation), here it is:
I. Hosting/Serving the GA Script (& automating updates
To do that, you can create a small shell script like this one to wget the latest ga.js version into your local directory (replacing the extant version it finds there).
#!/bin/sh
rm /My_Sites/sitename.com/analytics/ga.js
cd /My_Sites/sitename.com/analytics/
wget http://www.google-analytics.com/ga.js
chmod 644 /My_Sites/sitename.com/analytics/ga.js
cd ${OLDPWD}
exit 0;
(Thanks to AskApache.com, which provided the original motivation and config details to do this in a production context.)
II. Create __utm.gif file
This is just a transparent 1x1 pixel gif image, which you will place in Site directory (doesn't matter where, it just needs to match the location recited in your pages)
III. Log the __utm.gif Requests
For a testing protocol in which you are the source of the client-side activity (e.g., you want to verify the cross-browser fidelity of some event-tracking code you've added to a page on your Site, so you automate 5000 clicks on the button you just wired up,serving the page from your dev server set up for this purpose) it's probably simplest to just log the Request Headers, because it's in those headers that the GA script directs the client to gather various data from the DOM, from the location bar (url), and from prior http headers, and append them to a request for a resource on the GA server (__utm.gif, which is just a 1x1 transparent pixel).
For this type of protocol, i use the Firefox addon, LiveHTTPHeaders. You install it like any other Firefox addon, a few mouse clicks is all. Next, open it, and click the "Generator" tab. From this window, you can see the actual requests in real time. At the bottom of the window is a 'save' button to store the log. I find it easier to configure LiveHTTPHeaders to log only the __utm.gif requests; to do that, just click the 'Edit' tab and create a siimple filter to exclude everything except these particular gif images (using the check boxes on the right, and the large text box to the right).
Other kinds of test protocols require you to work from your Server Activity Logs; in that case just add this line to each page of your Site, just below __trackPageview():
pageTracker._setLocalRemoteServerMode();
IV. Parse those logged requests so you can actually read them
So now your log will contain individual transction lines, each one of which is a string appended to an HTTP Request for the GA tracking pixel. This string is just a concatenation of key-value pairs, each key begins with the letters "utm" (probably for "urchin tracker"). Each of these parameters corresponds to a variable that you see in the GA Dashboard (here's a complete list and description of them). This is all you need to know to build a parser. In more detail:
First, here's a sanitized __utm.gif request (the entries in your LiveHTTPHeaders log):
http://www.google-analytics.com/__utm.gif?utmwv=1&utmn=1669045322&utmcs=UTF-8&utmsr=1280x800&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmcn=1&utmdt=Position%20Listings%20%7C%20Linden%20Lab&utmhn=lindenlab.hrmdirect.com&utmr=http://lindenlab.com/employment&utmp=/employment/openings.php?sort=da&&utmac=UA-XXXXXX-X&utmcc=__utma%3D87045125.1669045322.1274256051.1274256051.1274256051.1%3B%2B__utmb%3D87045125%3B%2B__utmc%3D87045125%3B%2B__utmz%3D87045125.1274256051.1.1.utmccn%3D(referral)%7Cutmcsr%3Dlindenlab.com%7Cutmcct%3D%2Femployment%7Cutmcmd%3Dreferral%3B%2B
This is my parser (in Python):
# regular expression module imported
import re
pattern = r'\&{1,2}'
pat_obj = re.compile(pattern)
# splitting the gif request on the '&' character
# (which GA originally used to concatenate each piece to build the request)
# (here, i've bound the __utm.gif to the variable by 'gfx')
gfx1 = pat_obj.split(gfx)
# create a look-up table to map a descriptive name to each gif request parameter
# (note, this isn't the entire list, which i've linked to above)
keys = "utmje utmsc utmsr utmac utmcc utmcn utmcr utmcs utmdt utme utmfl utmhn utmn utmp utmr utmul utmwv"
values = "java_enabled screen_color_depth screen_resolution account_string cookies campaign_session_new repeat_campaign_visit language_encoding page_title event_tracking_data flash_version host_name GIF_req_unique_id page_request referral_url browser_language gatc_version"
keys = keys.strip().split()
#create the look-up table
GIF_REQUEST_PARAMS = dict(zip(keys, values))
# parse each request parameter and map the parameter name to a descriptive name:
pattern = r'(utm\w{1,2})=(.*?)$'
pat_obj = re.compile(pattern)
for itm in gfx1 :
m = pat_obj.search(itm)
if m :
fmt = '{0:25} {1:10}'
print( fmt.format( GIF_REQUEST_PARAMS[m.group(1)], m.group(2) ) )
The result looks like this:
gatc_version              1         
GIF_req_unique_id         1669045322
language_encoding         UTF-8     
screen_resolution         1280x800  
screen_color_depth        24-bit    
browser_language          en-us     
java_enabled              1         
flash_version             10.0%20r45
campaign_session_new      1         
page_title                Position%20Listings%20%7C%20Linden%20Lab
host_name                 lindenlab.hrmdirect.com
referral_url              http://lindenlab.com/employment
page_request              /employment/openings.php?sort=da
account_string            UA-XXXXXX-X
cookies
To avoid making this longer still, i left out the cookies' value. They obviously require a separate parsing step, though it's virtually identical to the step i just showed. Again, each request represents a single transaction, so you can store them as you need to.

Related

How to figure out where is the raw data in a table?

https://www.nyse.com/quote/XNYS:A
After I access the above URL, I open Developer Tools in Firefox. Then change the date in HISTORIC PRICES, then click 'GO'. The table is updated. But I don't see relevant HTTP requests sent in devtools.
So this means that the data has already been downloaded in the first request. But I can not figure out how to extract the raw data of the table. Could anybody take a look at how to extract the raw data from the table? (Note that I don't want to use methods like selenium, I want to stay with raw HTTP requests to get the raw data.)
EDIT: websocket is mentioned in the comment. But I can't see it in Developer Tools. I add websocket tag anyway in case somebody knows more about websocket can chime in.
I am afraid you cannot extract javascript rendered content without selenium. You can always make use of a headless browser(you don't see any instance on your screen, the only pitfall is that you have to wait until the page fully loads) and it won't bother you anymore.
In other words, all the other scraping libs are based on urls and forms. Scrapy can post forms but not run javascripts.
Selenium will save the day, all you lose is a couple of seconds for each attempt(will be milliseconds if it is run in frontend). You can share page source with driver.page_source and it can be directly used for parsing(as a html text) with BeautifulSoup or whatever.
You can do it with requests-html, for example let's grab the first row of the table:
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://www.nyse.com/quote/XNYS:A'
r = session.get(url)
r.html.render(sleep=7)
first_row = r.html.find('.flex_tr', first=True)
print(first_row.text)
Output:
06/18/2021
146.31
146.83
144.94
145.01
3,220,680
As #Nikita said you will have to wait the page loading (here 7sec but maybe less), but if you want to do multiple requests you can do it asynchronously !

Mobile data reported in GA Measurement Protocol appear in realtime but not in daily summary

I've been attempting to log activity on a mobile-like device using the Google Analytics Measurement Protocol. All of these attempts have validated using the validation URL, and I can see activity when I look at the real-time reports on the Analytics website. But when I look at the Home or Overview reports for the day - no activity is shown.
The view is set for "All Mobile App Data".
The POST body looks something like this:
v=1&tid=UA-000000000-1&ds=app&qt=1601&uid=uid-zzzzz&t=screenview&cd=Foo&an=Foo%20App%20Name&aid=com.example.foo&aiid=com.example.foo&av=0.0.1&ua=Mozilla%2F5.0%20(Linux%3B%20Android%207.0%3B%20SM-G930V%20Build%2FNRD90M)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F59.0.3071.125%20Mobile%20Safari%2F537.36
The ua field is just a pre-defined string. I found that if I omitted it, the Real Time monitoring listed the hits as desktop hits, although I was in a Mobile report and the ds field was "app".
Am I missing a field that is required? Is there some reason why it is showing up in the real-time report, but not in a daily report? Is there some other way to diagnose why the data is vanishing, or confirm the data is actually being captured?
When i check the debug endpoint the hit is valid
Request:
https://www.google-analytics.com/debug/collect?v=1&tid=UA-XXX-1&ds=app&qt=1601&uid=uid-zzzzz&t=screenview&cd=Foo&an=Foo%20App%20Name&aid=com.example.foo&aiid=com.example.foo&av=0.0.1&ua=Mozilla%2F5.0%20(Linux%3B%20Android%207.0%3B%20SM-G930V%20Build%2FNRD90M)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F59.0.3071.125%20Mobile%20Safari%2F537.36
Response
{
"hitParsingResult": [ {
"valid": true,
"parserMessage": [ ],
"hit": "/debug/collect?v=1\u0026tid=UA-53766825-1\u0026ds=app\u0026qt=1601\u0026uid=uid-zzzzz\u0026t=screenview\u0026cd=Foo\u0026an=Foo%20App%20Name\u0026aid=com.example.foo\u0026aiid=com.example.foo\u0026av=0.0.1\u0026ua=Mozilla%2F5.0%20(Linux%3B%20Android%207.0%3B%20SM-G930V%20Build%2FNRD90M)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F59.0.3071.125%20Mobile%20Safari%2F537.36"
} ],
"parserMessage": [ {
"messageType": "INFO",
"description": "Found 1 hit in the request."
} ]
}
I cannot use one of the mobile libraries from Firebase - this is not one of the platforms they support. I do not wish to pretend this is a web page - there is no associated hostname or path. I do not wish to use Events since I can't do event Behavior Flow, which is one of the things I'm interested in seeing.
I'm aware that it can sometimes take "a day or so" for results to first appear. The site was setup over five days ago at this point, and has received data during that time.
Good thought about the anti-spam setting, however the setting appears to be correct:
I've also tried using GET instead of POST - no change, it still shows the hit in real-time, but then it vanishes.
However, I know that it can record hits permanently. There were two hits from a spammer in Russia that have shown up in the daily report (I wasn't there to see it show up in real-time). I don't know what they did, but would love to find out since it might help figure out how I can add a record.
In the real-time reports, it correctly points out the data center all the hits are coming from. Perhaps that is filtering it out somewhere out of my control?
Try adding Cid I know it says this is an optional parameter but for mobile accounts I belive it may be required.
Client ID
Optional.
This field is required if User ID (uid) is not specified in the request. This anonymously identifies a particular user, device, or browser instance. For the web, this is generally stored as a first-party cookie with a two-year expiration. For mobile apps, this is randomly generated for each particular instance of an application install. The value of this field should be a random UUID (version 4) as described in http://www.ietf.org/rfc/rfc4122.txt.
Example value: 35009a79-1a05-49d7-b876-2b884d0f825b
Although this says it needs to be a UUIDv4, it does work with other UUIDs (I've tested it with a v5, which is a hash against the value used for the uid parameter).

Receiving unexpected server calls

In adobe analytics I try to implement link tracking for all links can be found in a page using this:
$(document).on('click', 'a', function() {
s.tl(this, 'e', 'external', null, 'navigate');
return false;
});
Try to test it using a page like this
The extra calls are likely coming from how you have Adobe Analytics configured. There are a handful of config variables that will cause extra requests depending on how you set them (on their own and/or in relation to each other).
Here is a listing of Adobe Analytics variables for reference. These are the ones for you to look at:
s.trackDownloadLinks - If this is enabled, any standard links with href value ending in value(s) specified in s.linkDownloadFileTypes will trigger a request on click. Generally, this is to enable automatic tracking for links that prompt a visitor to download something (e.g. a pdf file).
s.trackExternalLinks - If this is enabled, any standard links with href NOT matched in s.linkInternalFilters OR matched with s.linkExternalFilters will trigger a request on click. Generally, this is to enable automatic tracking for links you count as visitor navigating off your site(s).
s.linkInternalFilters - If you have either of the above enabled, clicking on links may trigger a request, depending on values here vs. what you enabled above vs. what you have in s.linkExternalFilters. Generally, this should include values that represent links you do NOT want to count as navigating off your site(s).
s.linkExternalFilters - If you have either of the above enabled, clicking on links may trigger a request, depending on values here vs. what you enabled above vs. what you have in s.linkInternalFilters. Generally, you should never set this. It's intended for edge-use-cases for people who know what they are doing and have a complex site eco-system and definitions of what counts as internal vs. external.
s.trackInlineStats - This is for clickmap/heatmap tracking. This may or may not trigger an extra request, depending on how a lot of different stars align.
In addition to these, you may already have some plugins or other custom code that triggers click tracking. For example, there are linkHandler, exitLinkTracker, and downloadLinkTracker plugins that you may have included in your code that may play a part in extra requests being triggered.
Finally, more recent versions of Adobe Analytics code may trigger multiple requests depending on how much data you are trying to send in the request (whereas older versions just truncated the request, which resulted in data loss).
In any case, the long story short here is if you are looking to roll your own custom link tracking, you should make sure the above variables/plugins are removed or otherwise disabled.
But on the note of rolling your own custom link tracking.. I'm getting a sense of de ja vu here, like I already made a comment about this relatively recently in another post, over this exact same code... but generally speaking, this is not a good idea:
$(document).on('click', 'a', function() {
s.tl(this, 'e', 'external', null, 'navigate');
return false;
});
You are wholesale implementing exit link tracking on every single link of your page. And you are giving them all the same generic "external" label. And the native exit link reports are pretty limited and useless to begin with, so ideally you should also pop an eVar or something with the exit url or something.
But more importantly.. unless literally every single link on your pages are links that navigate your visitor off-site, this is not going to be useful to you in reports in general, and it's even going to ruin a lot of your reports.
I can't believe (or accept) that you really want to count every link on your pages as exit links..
I assume s.tl does an ajax call.
It should then forward the link to the href of the link - if the link is allowed to be followed immediately, the ajax call will be interrupted which seems to be what you see
You may want to change to
$(document).on('click', 'a', function(e) {
e.preventDefault();
s.tl(this, 'e', 'external', null, 'navigate');
});
I found this article when looking to see what s.tl is https://marketing.adobe.com/developer/forum/general-topic-forum/difference-between-s-t-and-s-tl-function

How to use url_path in Google Tag Manager to track directories

I want to have content groups in Google Analytics and I'm using Google Tag Manager to implement them. The way to do it, according to their reference, is to create a lookup table that is using the url_path macro to filter URLs. The url_path only gives the path of the URL, stripping the end of it, so for a url http://www.example.com/hello/index.html the result would be /hello/.
I want to group my users' account pages which are like: http://www.example.com/accounts/profile/user1/
The problem with the above macro is that it would return /accounts/profile/user1 which is not what I want. I only want to keep /accounts/profile/.
How could I accomplish that using this macro?
For helping you, in GTM you just have to configure the "Content Grouping" part (and take special care of the index that you put in). All the stuff is on GA Backend, where you declare your content group and which give you an index for each content group (index that you have to keep in GTM).
For some GA account you have to wait around 48 hours till you got some data, if your hit is ok you can see your content grouping information in the variable utmpg (like :" 1:Accueil,2:Page de destination | Actualité | ---,5:www.ouest-france.fr/home" for example).
Hopes it will help you to understand.
Fanny

How to download HTML Report from HP ALM Performance Center 11.0 using rest API

I want to download HTML default report for a test run from Performance Center storage (using Rest API). Actually I need just summary.html file.
I was using the following steps in PC 11.5:
Request test scenarios:
http://{server:port}/qcbin/rest/domains/{domain}/projects/{project}/tests?fields=id,last-modified,name,owner&query={subtype-id[=PERFORMANCE-TEST]}&page-size=max
Let user choose the scenario (id) and request all its runs:
http://{server:port}/qcbin/rest/domains/{domain}/projects/{project}/runs?page-size=max&fields=id,owner,pc-start-time,duration,status,test-id&query={test-id[=234]}
Let user choose the run (id) and request Report (result entity):
http://{server:port}/qcbin/rest/domains/{domain}/projects/{project}/results?page-size=max&query={run-id[=123];name[=Reports]}&fields=id,name
Request "summary.html" file using file-id taken from previous step response:
http://{server:port}/qcbin/rest/domains/{domain}/projects/{project}/results/{file-id}/storage/report/summary.html
However it is not working with Performance Center 11.0. It fails at last step:
qccore.general-error
Not Found
I guess it is because the path of report was changed.
Can someone tell the path for summary.html for Performance Center 11.0?
I've been able to have a little bit of success with this. Rather than use the request you are using above I used the following:
http://{server:port}/qcbin/rest/domains/{domain}/projects/{project}/results/{file-id}/logical-storage/
This gave me a zip file, which contained the report inside it.

Resources