Querying data in "real time" from GA with BigQuery? - google-analytics

I'm interesting in run some queries at BigQuery to export data in "real time", but I don't know how to do it.
The "intraday" dataset only uploads 4 or 5 times per day, and this isn't what I need.
My question: is it possible to get this data?
Thanks.

For real time data you could try using the Google Analytics API (I tend to use the Python Client to run this type of analyzes).
If this does not suit your needs well then the only option I know is having some backend infra-structure that collects data from your website and publishes it to a queue, where you can further process the data.
This post has lots of good advices, you can also check on it (such as using pub/sub, dataflow, BQ live stream and so on). Keep in mind though that this last approach is way more complex and resource dependent so it's important for you to know well what you are doing.

Related

Explore Free Form report in Google Analytics

I am trying to generate a report using Google Analytics Explore tab using Free Form technique. Few weeks ago I could use Message name, stream name and time to see all the notification name, platform and total no of click. I exported the same to excel file.
but today when I tried to generate the same I couldn't find "Message Name" dimension. Is this field removed from pre defined/custom dimensions from GA? or am I doing something wrong?
My main purpose is to get all list of notifications sent via Firebase.
Any help will be deeply appreciated.
Given that you excluded the obvious issues like using the too-fresh data, the proper way to debug it is to export the data into a sample BQ table, then conduct exactly the same analysis that you're trying to conduct in GA4's explorer. From there, if your issue is with explorer's filters, you will quickly see it.
If, however, you're able to see your event properties in BQ, but not able to get the explorer to display them... Well, Google likely saved quite a lot of money on GA4. UA was pretty expensive. GA4 now introduces all these amazing features like data retention limits, properties' values cardinality bugs, odd inconsistencies between explore's reports and default reports and so on.
For now, the best way to really access your data minus all the artificial limitations of GA4 is to ETL your data from there either through the reporting API or exporting it to BQ.

Google Analytics Real-time + historical data

I work for a non-profit that needs to see how our fundraising efforts are going in 'real-time'.
We look at results in blocks of about a half hour - so we need to report on how we finished the last 24 hours or so and also where we're at in the current half-hour. We're accomplishing this through google analytics, as we have multiple fundraising streams all pointing to a common GA account.
I have tried using datastudio to report against the GA API, but that connector does not seem to refresh at a reliable rate - someitmes it'll pull fresh data within a minute, sometimes it can take twenty minutes to report on recent transactions. I believe the 'real-time' API could be used to get fresher GA data, but as far as I can tell, that will only report 'live' data, and not prior/historical data (say from four hours ago). Does anyone know what API I could use if any to pull all data historical through current datetime?
I apologize if this request is vague, but I'm just looking for a conceptual approach at this point to get the freshest data - preferably in one fell swoop (API call). There is more complexity post-data intake (I have to then compare it to goals we've set for each half-hour, amongst other nuances to the transacitons themselves), so i wanted to start with this fundamental piece/question.
Thanks!
Given the context provided, I believe that the API solution would not be feasible. Among other reasons:
The real time API only offers a limited amount of dimensions and metrics. For example, e-commerce data is not available.
https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets
The Standard intraday processing SLA for the Core Reporting API is < 24 hours for standard properties. The processing occurs on a best effort basis. Meaning that an hourly availability can occur from time to time but can not be guaranteed.
https://support.google.com/analytics/answer/7084038?hl=en
As an alternative approach to the API solution, you could consider the use of an App + Web property which would allow you to stream event data in real time to BigQuery. However, this solution has some cost implications and would introduce you to a new tracking paradigm.
https://developers.google.com/analytics/devguides/collection/app-web/tag-guide
https://support.google.com/firebase/answer/6318765?hl=en
https://www.simoahava.com/analytics/getting-started-with-google-analytics-app-web/

Ingesting Google Analytics data into S3 or Redshift

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.
Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).
scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.
Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json
I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.

enabling hourly data in google analytics

I have two view/profiles linked to my google analytics account. I want to fetch the hourly data for the current day, ie
start date:today
end date: today
with a few filters and dimensions.
Now I am getting the response for one view that means it is possible in google analytics, however for the other view its showing all the values as 0- this applies both to the gui and the api.
Can anyone suggest me how to enable it for the other view as well?
You cannot. Google Analytics needs some processing time. It might be that some data appears immediately, especially on small accounts, but it's not guaranteed and not a thing you can "enable" or count on.
Updated: Okay, that was a dumb answer. Still, there is a processing latency event in GA Premium. It is possible to get realtime data, but that's a different API with limited data (the core reporting API might return data, but no guarantees for that).
But I admit, since your problem is that you do not get data for the whole day yor have a different problem. But with a premium account you should be able to contact your account manager/technical support.

Automatic extraction of data from google analytics

We usually import data from google analytics once a month and use it for some reporting needs internally. The problem is that we have to do this manually and it would be nice if we could automate the process and potentially increase the once a month routine to once a week or even daily. Our ultimate goal would be to have a tool set up to import the data automatically and store it to a csv or excel file. The output file doesn't really matter to us. As long as we can have the data pulled from GA on a regular basis without manual intervention, we'll take care of what to do with the data once it's here. We use some java based executable (found online) for this but we run this manually to extract the data.
I have looked for some solutions, even open source tools(.Net preferably, anything but java based really) but I have not really found anything. most of them require manual intervention to export data, and the best they can do is have reports generated automatically based on that data.
Our last resort would be to write up something ourselves but I would like research this a bit further and save developing/programming time. I am pretty sure someone out there has at least encounter/though of this problem.
Any help, pointers or redirection to better sources would be much appreciated.
Thanks
Have you looked into the Core Reporting API or Google Analytic's Magic Script? These would allow you to pull data into Google Spreadsheets on a regular basis. Specifically, the Magic Script will allow you to setup triggers to run a function on reoccurring time interval E.g. daily, weekly, monthly, etc.

Resources