How do I get raw logs from Google Analytics? - google-analytics

Is it possible to obtain raw logs from Google Analytic? Is there any tool that can generate the raw logs from GA?

No you can't get the raw logs, but there's nothing stopping you from getting the exact same data logged to your own web server logs. Have a look at the Urchin code and borrow that, changing the following two lines to point to your web server instead.
var _ugifpath2="http://www.google-analytics.com/__utm.gif";
if (_udl.protocol=="https:") _ugifpath2="https://ssl.google-analytics.com/__utm.gif";
You'll want to create a __utm.gif file so that they don't show up in the logs as 404s.
Obviously you'll need to parse the variables out of the hits into your web server logs. The log line in Apache looks something like this. You'll have lots of "fun" parsing out all the various stuff you want from that, but everything Google Analytics gets from the basic JavaScript tagging comes in like this.
127.0.0.1 - - [02/Oct/2008:10:17:18 +1000] "GET /__utm.gif?utmwv=1.3&utmn=172543292&utmcs=ISO-8859-1&utmsr=1280x1024&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=9.0%20%20r124&utmdt=My%20Web%20Page&utmhn=www.mydomain.com&utmhid=979599568&utmr=-&utmp=/urlgoeshere/&utmac=UA-1715941-2&utmcc=__utma%3D113887236.511203954.1220404968.1222846275.1222906638.33%3B%2B__utmz%3D113887236.1222393496.27.2.utmccn%3D(organic)%7Cutmcsr%3Dgoogle%7Cutmctr%3Dsapphire%2Btechnologies%2Bsite%253Arumble.net%7Cutmcmd%3Dorganic%3B%2B HTTP/1.0" 200 35 "http://www.mydomain.com/urlgoeshere/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19"

No. But why don't you just use your webserver's logs? The value of GA is not in the data they collect, but the aggregation/analysis. That's why it's not called Google Raw Data.

Please have a look on this article which explains a hack to get Google analytics data.
http://blogoscoped.com/archive/2008-01-17-n73.html
Also If you can wait for sometime then official Google analytics blog says that they are working on data export api but currently it is in Private Beta.
http://analytics.blogspot.com/2008/10/more-enterprise-class-features-added-to.html

Not exactly the same as raw vs aggregated, but it seems that "unsampled" data is only available to Premium accounts:
"Unsampled Reports are only available in Premium accounts using the latest version of Google Analytics."
http://support.google.com/analytics/bin/answer.py?hl=en&answer=2601061

You can get the Analytics data, but it'll take a bit of hacking.
In any analytics report, click the 'email' button at the top of the screen. Set up the email to go to your address (or a new address on your server) and change the format to csv or xml.
Then, you can use php (or another language) to check the email account, parse the email and import the attachment to your system.
There's an article entitled 'Incoming mail and PHP' on evolt.org: http://evolt.org/incoming_mail_and_php

No, but there are other paid services like Mixpanel and KISSmetrics that have data export APIs. Much easier than trying to build your own analytics service, but costs money.

Related

Scraping Websites via Google Cached Pages pages has been blocked

I'm trying to create a Service that Scraping websites by using Google Cached Pages.
Example
https://webcache.googleusercontent.com/search?q=cache:nike.com
The Response that I get is the HTML from Google cache, which is an older version of the Nike site.
And it works fine as long as I run it locally on my computer,
but when I deploy to google cloud platform, there I use porxy server
I get a 403 error that I can not access the information through a porxy server
Example of response from proxy server
433. That’s an error.Your client does not have permission to get URL /s
earch?q=cache:http://nike.com from this server. (Client IP address: XX.XXX.XX.XXX)<br
Please see Google's Terms of Service posted at
https://policies.google.com/terms If you believe that you
have received this response in error, please report your
problem. However, please make sure to take a look at our Terms of
Service (http://www.google.com/terms_of_service.html). In your email,
please send us the entire code displayed below. Please also
send us any information you may know about how you are performing your
Google searches-- for example, "I' m using the Opera browser on Linux
to do searches from home. My Internet access is through a dial-up
account I have with the FooCorp ISP." or "I'm using the Konqueror
browser on Linux t o search from my job at myFoo.com. My machine's IP
address is 10.20.30.40, but all of myFoo' s web traffic goes through
some kind of proxy server whose IP address is 10.11.12.13." (If y ou
don't know any information like this, that's OK. But this kind of
information can help us track down problems, so please tell us what
you can.)We will use all this information to diagnose the
problem, and we'll hopefully have you back up and searching with
Google agai n quickly! Please note that although we read all
the email we receive, we are not always able to send a personal
response to each and every email. So don't despair if you don't hear
back from u s! Also note that if you do not send us the
entire code below, we will not be able to help
you.Best wishes,The Google
Article that talks about the problem https://proxyserver.com/web-scraping-crawling/scraping-websites-via-google-cached-pages/
How can I solve this problem, and run requests from the cloud as well without being blocked? Add parameters?
Thanks :)
I guess that you should add a property in the header of your http request
for example :
URL u = new URL("https://www.google.com//search?q=c");
URLConnection c = u.openConnection();
c.setRequestProperty("User-Agent", "MSIE 7.0");
or
HttpRequest request =HttpRequest.newBuilder(new URI("https://www.google.com//search?q=c")).header("User-Agent", "MSIE 7.0").GET().build();
// note to change the URI
this two examples are in Java but the same concept is applied in all environments I guess
hope that was helpfull

How do I get the value of the Authorization key from the site?

Good afternoon. I log in to one site using my username/ password
I pulled an unofficial API from the site. Registration on a direct request to him with the transfer of JSON was successful
But further use of the API requires the Authorization key -
see screenshot
How do I get it and use it for API requests? I tried to get cookies, but nothing changes with them
I just started learning request and stopped right there, sorry if the question is stupid and I'm wasting your time
Reading the documentation is a powerful tool that programmer must leverage you can see in Requests documentation they already gave you the example
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
r.status_code
>>> 200
r.headers['content-type']
>>> application/json; charset=utf8

Making an HTTP request with a blank user agent

I'm troubleshooting an issue that I think may be related to request filtering. Specifically, it seems every connection to a site made with a blank user agent string is being shown a 403 error. I can generate other 403 errors on the server doing things like trying to browse a directory with no default document while directory browsing is turned off. I can also generate a 403 error by using a tool like Modify Headers for Google Chrome (Google Chrome extension) to set my user agent string to the Baidu spider string which I know has been blocked.
What I can't seem to do is generate a request with a BLANK user agent string to try that. The extensions I've looked at require something in that field. Is there a tool or method I can use to make a GET or POST request to a website with a blank user agent string?
I recommend trying a CLI tool like cURL or a UI tool like Postman. You can carefully craft each header, parameter and value that you place in your HTTP request and trace fully the end to end request-response result.
This example straight from the cURL docs on User Agents shows you how you can play around with setting the user agent via cli.
curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
In postman its just as easy, just tinker with the headers and params as needed. You can also click the "code" link on the right hand side and view as HTTP when you want to see the resulting request.
You can also use a heap of hther HTTP tools such as Paw and Insomnia, all of which are quite well suited to your task at hand.
One last tip - in your chrome debugging tools, you can right click the specific request from the network tab and copy it as cURL. You can then paste your cURL command and modify as needed. In Postman you can import a request and past from raw text and Postman will interpret the cURL command for you which is particularly handy.

How to set custom user-agent that Google Analytics can read

I want to set a custom user-agent for a webview app that embeds my website. I am able to set a custom agent like this ("My App Android").
The issue is that Google Analytics reads traffic as Desktop for this agent not mobile like regular webview.
What's the best method to set a custom user-agent while still keeping data like mobile, and Device OS so tools like Google Analytics can still read it.
You can manipulate the User Agent but you can't control how Google will interpret the resulting device/OS:
The processing is done on the server side (Google) so there is no way of directly modifying that data (even when sending data via the measurement protocol).
The processing details are not disclosed by Google so you won't know what the outcome of your experiments are until they're reported by Google Analytics (which due to the 24-48 hour data processing latency might make such experimentation tedious).
Attempting to manipulate it might "break" your analytics: Google is vague about this, they just say: "Google has libraries to identify real user agents. Hand crafting your own agent could break at any time". 2 consequences I can think of: Google simply drops the traffic if it can't parse the User Agent OR marks it as bot/spider traffic (which will also be dropped if you have enabled the bot filtering option).
Although it's not mentioned in the documentation, I also suspect Google to rely on other data points, which could be:
Screen resolution
Java Support
Flash version
I couldn't find more details on the topic, and I don't think you will find more details from Google explaining what they use to calculate browser/device because they don't want people messing with it (analogy: you won't find details about which data points are used for SEO, because they don't want people messing with it). The 4 dimensions I listed (User Agent, Screen resolution, Java Support, Flash version), are to my knowledge the only 4 that are device-specific from all GA collects (others are derived from them):
https://developers.google.com/analytics/devguides/reporting/core/dimsmets#view=detail&group=platform_or_device
As in MAX's answer it's true, it's very difficult to manipulate the user-agent while keeping all the attributes, Like OS, and rendering engine etc...
At the sametime I still want to target my app users with a custom user-agent, and be able to separate traffic from this webview app.
What I did is this:
1- Setting the custom user-agent
Instead of replacing the whole user-agent with a custom one, I appended this to the user-agent [AppID/AppVersion], found great info from this blog: Webviews and User-Agent strings.
Now the user-agent looks something like this:
Mozilla/5.0 (Linux; Android 9; wv)
AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91
Mobile Safari/537.36 [Custom App/1.0.1]
Check: Correct way to format user-agent string in an Android WebView App?
2- Setting a custom dimension in Google Analytics
Since Google Analytics will mark all browser value visits from this agent as Android Webview, I went to assign a custom dimension to be able to identify the custom user-agent sessions and create a separate view for it.
In the backend with PHP I set the value of the dimension based on the user-agent.
<script>
<?php
if(strpos($_SERVER['HTTP_USER_AGENT'], 'Custom user agent here')!==false)
{
$customAgent_value = 'your agent';
}
?>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-', {
'custom_map': {'dimension1': 'custom_agent'}
});
gtag('event', 'custom_agent_event', {'custom_agent': '<?= $customAgent_value;?>'});
</script>
This is working fine for me now. I can target users from a specific webview app, and at the same time am able to separate the traffic from different webviews in Analytics.

Woocommerce REST API connection refused

I've been trying to connect to the REST API of Woocommerce (using HTTP Basic Auth) but fail to do so.
I'm probably doing stuff wrong (first timer # REST API's), but here is what I've been doing:
I'm using a GET with an url consisting of: https://example.com/wc-api/v2/
I'm using an Authorization header with the consumer key and secret base64 encoded
I've enabled the REST Api in the Woocommerce setting and enabled secure checkout. Also I've put some product in the shop. But whenever I try to run the URL as described above; the connection is just being refused.
I do not receive an error, but it looks like the page cannot even be reached. Can someone help me out?
I've followed the docs (http://woothemes.github.io/woocommerce-rest-api-docs/#requestsresponses) up to the Authentication-section, but that's where I've been stuck up till now.
The complete url I'm using is:
http://[MYDOMAIN]/wc-api/v2/orders
With the HTTP-header looking like:
GET /wc-api/v2/ HTTP/1.1
Authorization: Basic [BASE64 encoded_key:BASE64 encoded_secret]
Host: [MYDOMAIN]
Connection: close
User-Agent: Paw/2.1.1 (Macintosh; OS X/10.10.2) GCDHTTPRequest
Then after I run the request I'm getting:
Given the screenshot that you posted, it seems that the server is not responding on HTTPS. So you'll need to configure your webserver to respond to HTTPS requests, and to do that you'll need to install an SSL certificate.
You can either generate one yourself, which is free, but won't work for the general public. Or you can buy one - most domain registrars and hosts will let you buy a certificate, and they usually start at around $50 per year.
I'm using a GET with an url consisting of: https://example.com/wc-api/v2/
In this example, you're using HTTPS. Is that where you're trying to connect?
I highly recommend going straight to HTTPS connection. It's a thousand times easier to accomplish. Documentation for over HTTPS can be found here. Follow directions for "OVER HTTPS". From there you can use something like Postman to test if you'd like.

Resources