Ingesting Google Analytics data into S3 or Redshift - google-analytics

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.

Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).

scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.

Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json

I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.

Related

Explore Free Form report in Google Analytics

I am trying to generate a report using Google Analytics Explore tab using Free Form technique. Few weeks ago I could use Message name, stream name and time to see all the notification name, platform and total no of click. I exported the same to excel file.
but today when I tried to generate the same I couldn't find "Message Name" dimension. Is this field removed from pre defined/custom dimensions from GA? or am I doing something wrong?
My main purpose is to get all list of notifications sent via Firebase.
Any help will be deeply appreciated.
Given that you excluded the obvious issues like using the too-fresh data, the proper way to debug it is to export the data into a sample BQ table, then conduct exactly the same analysis that you're trying to conduct in GA4's explorer. From there, if your issue is with explorer's filters, you will quickly see it.
If, however, you're able to see your event properties in BQ, but not able to get the explorer to display them... Well, Google likely saved quite a lot of money on GA4. UA was pretty expensive. GA4 now introduces all these amazing features like data retention limits, properties' values cardinality bugs, odd inconsistencies between explore's reports and default reports and so on.
For now, the best way to really access your data minus all the artificial limitations of GA4 is to ETL your data from there either through the reporting API or exporting it to BQ.

Querying data in "real time" from GA with BigQuery?

I'm interesting in run some queries at BigQuery to export data in "real time", but I don't know how to do it.
The "intraday" dataset only uploads 4 or 5 times per day, and this isn't what I need.
My question: is it possible to get this data?
Thanks.
For real time data you could try using the Google Analytics API (I tend to use the Python Client to run this type of analyzes).
If this does not suit your needs well then the only option I know is having some backend infra-structure that collects data from your website and publishes it to a queue, where you can further process the data.
This post has lots of good advices, you can also check on it (such as using pub/sub, dataflow, BQ live stream and so on). Keep in mind though that this last approach is way more complex and resource dependent so it's important for you to know well what you are doing.

How to remove an e-commerce transaction?

I have implemented google analytics ecommerce tracking in my website. But there was a mistake while passing parameters to google analytics. My order get tracked but product sku code is not set.
Its a dummy order that i dont want show in any google analytics report.
Can you suggest how can i delete this order from google analytics?
I am afraid you cannot remove data from GA once it has been collected.
What you can do is:
hide it: create an Advanced segment, the transaction remains in your GA profile but at least it is not included in the reports.
make a copy: copy the profile and delete the old one (it means you lose historical data)
There is one more option:
1.- You could create a new transaction with the same amount in money, but with a negative sign. For example, if you have recored a transaction for 1,000 dollars, you could recreate it with a "-1000.00" amount. Doing this would "cancell" the wrong transaction.
Important: This will only work when the user sees a long period of time, including the wrong transaction and the fix.
Julien is right. You cannot remove the data.
There're a couple more options in addition to Julien's suggestions though
You can go to "Filters" option of the view and try to see if you can filter it out. Luckily, ecommerce transactions have their own category that can help you narrow down the variable you need to use. (screenshot attached)
Go a little more advanced than filters and use "Data Import" where you import the ecommerce transactions via a spreadsheet thereby overwriting the transactions for that day. So, what you would essentially do is take all the real transactions of ecommerce from your ecommerce application, export them to CSV and then upload it into GA without the test transaction.
Lastly, a tip: create a test profile for things like this.
One of the answers hinted at data imports (but in a way that would probably not have worked). Universal Analytics actually introduced a way to refund transactions (effectively canceling them out) via data imports. However this only works if the data was collected via enhanced e-commerce tracking. As per documentation:
In order to process refunds you need to have collected transaction
data with the ec.js plugin
With standard e-commcerce-tracking Omar Gonzales' answer is still the only working option (I'd like to add the additonal caveat that the negative transaction might be attributed to the wrong channel, so make sure to look at the source/medium/campaign data for the transaction you want to cancel out and supply that data via utm parameters).

A single google analytics account for many users - possible?

Is it possible to use a single google analytics account, in particular, e-commerce, for more than user? I fact, I need it to be used for as a lot of users. What I want in a nutshell is this:
The users come to my web site and provide me their e-commerce data in json or any other format somehow. I have a google analytics, so I take that e-commerce data and send to google analytics. And then show them the reports for their data from google analytics by google analytics API (I guess it's reports API?)
The question is not whether or not it is profitable, makes sense, etc. The question is, can I use my, single google analytics account to achieve what I've described above?
Yes you can. Since you need to keep the users apart in a way that does not allow them to look into other users data you can use a single account for up to 50 users, since this is how many data views you can have per account (view permissions can be set at account level)1. Filter the view by hostname (or whatever) to record only the current users data per view.
If you do not need the interface (i.e. if you want to query GA via the api and build custom dashboards) you can have even more - simply store in unique id per user and use that to filter the data before displaying it in a dashboard. So as far as that part of the question is concerned you are safe.
Where things probably start to fall apart is data collection. Is looks like you want to do some sort of batch processing of accumuluated e-commerce data. Since you cannot send a timestamp for a user interaction all dates within GA will be off. Plus you have data limits (I'm thinking of max interactions per minute that you can send), so your insertion process might be not very efficient. It would probably be better to create something on top of the measuremnt protocol that allows your clients to send data in realtime.
1 To make this a little clearer, you can set up 50 entities whith different access permissions. Of course every view can have as many users a you like, but they will all see the same data.

How to include custom segments in the list of segments when querying the Google Analytics API?

This may be a possible duplicate of this question, but according to all the Google Analytics documentation I really should be able to pull my list of custom segments.
Since I have a very large list of them, it would be suboptimal for me to manually copy the segment ids over one at a time.
I'm following this walk through. Steps to reproduce:
Create a custom segment using date of first session in your Google Analytics account.
Authorize the Google Analytics guide to access your Google Analytics account.
Try their on-page query tester, and inspect whether your custom segment is there.
One thing I've already ruled out was the user that created the segment. I've manually created a segment with the same user that I'm querying the API with and it still does not show. Is there a flag I need to set somewhere to include custom segments?
Edit:
It turns out that it will list some custom segments, but not ones created with date of first session, so this is a duplicate of this question, which means that there is a bug in the Google Analytics API.
There was a bug which is now fixed. So it is now possible to list the Date of Session Segments in the Google Analytics Management API by calling the segments.list() method.
So after days of trying to solve this one I've come to the conclusion that it cannot be done as asked.
There is, however, another way to do it. For every segment set up a daily (or weekly, etc) email report to a email as a TSV. In each email body specify the name of the segment so when you're consuming the emails you can know which segment the attached TSV is for. It doesn't look like the daily reports were designed with segments in mind, since non of the metadata included in the TSV mentions which segment it is for.
From there it's trivial. Connect to the email address using an IMAP client once a day and update the numbers.
Note that the daily email only contains the numbers for that day (not a specified range), so you'll need to first generate the report one time with the historical data to load in.
While hacky, one nice thing about this approach is that it keeps your reports in sync with your (faked through email) api code (provided you match the column headings in the TSV). So, if for example, a new filter is included into a report, the new daily fields will continue to update.
Unfortunately though, the past data won't be reflected in the change.
Obviously this isn't great, but if you are monitoring daily cohorts it's the best you've got if you need to stay with Google Analytics. I have raised this as a bug to the Google Analytics developers, but I haven't heard back as to whether or not they plan to fix it.

Resources