Google Analytics Ad Cost Upload Mismatch - google-analytics

I have a problem with cost data uploaded to Google Analytics via the Management API.
The script itself works just fine. On a daily schedule it uploads a CSV from BigQuery to the management API. But in some cases the cost data shown in the UI is different from the data in the upload file.
For example:
In my BigQuery Table is see costs of 349.44 for a certain campaign on a certain date.
In the UI for the same campaign and date I see 92.31.
If I download the uploaded CSV again from the Data Upload section I see 257.13.
So three different numbers in three places that should not differ.
For the same campaign and other dates the data is correct.
All costs are in EUR so no currency conversion takes place and not just recent data is affected but also data from many month ago. I haven't found any resources that address this issue and would be greatful for any input.

Related

What is correct format/syntax of timestamp in offline event data in google analytics 4

I have some data from CRM and I wanna use Offline Event Data in Data Import of GA 4 to insert CRM's "offline data".
In CSV files that I imported multiple times, All things are good, but although no error found, "timestamp_micros" of events did not appear in results after near 24 hours and time of upload is considered as timestamp of data.
Is this format for "Timestamp_micros" is TRUE?
2021-04-08T22:25:09.335541+02:00
How can I add timestamp dimension on import data offline in Google Analytics 4? Can be done without using GTM?
I expect that timestamp related to each event would appear correctly in results and reports.
That documentation page
https://support.google.com/analytics/answer/10325025?hl=en#template
gives an example
As you see it's a Unix time format

Google Analytics 4 Data upload

Is there a way to upload all of the data from GA3 to GA4? I've tried uploading CSV files downloaded from GA3 although I keep running into record errors.
The underlying data structure of Universal analytics and analytics ga4 is completely different.
UA analytics was based upon pageviews and screen views. While ga4 everything its based on events.
There is no way to covert UA data to GA4 because they are so different.
Beyond that Google analytics is hit based as in a point of time. While you can insert offline data that data would be no more then a few hours old. Inserting data from a website that had been recording data for years would not be possible.
Universal Analytics will be going away
After July 1, 2023, you'll be able to access your previously processed data in your Universal Analytics property for at least six months. We know your data is important to you, and we strongly encourage you to export your historical reports during this time.
The only real option google is giving us is to export it to a csv.

Ingesting Google Analytics data into S3 or Redshift

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.
Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).
scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.
Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json
I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.

Google API Returning Sampled Data

I'm using the Google-Analytics API to query my analytics for data using the Google Analytics Spreadsheet Add-on We then use the spreadsheet data in Google Data studio for a dashboard to display the data.
Everything has been going well for the last few months, however over the last 48 hours we have begun to receive sampled data when we query the API using the spreadsheet add on. This is undesirable for how we are using the data.
The total results that we were getting before being returned was about 1100 results. We have altered the date range of the query to be only 3 days whereas before we were querying since the start of the year.
Initially that worked and the results were no longer sampled. Then 24 hours later the data appears to be sampled again.
The documentation says the following regarding sampling for the free account:
Analytics Standard: 500k sessions at the view level for the date range you are using
We are not using our analytics that heavily so I cannot understand why we would have hit the 500K limit?
It is also not clear to me what a "View Level" is? Any help on this would be greatly appreciated.

Repartitioning Google Analytics data based on publish date

I'm looking to repartition Google analytics data from stats date tables (the default export we get each day) to dated tables based on published date of each article URL (a custom dim we have). This is for more efficient querying by publish date by a BI dashboard that needs to calculate user counts on the fly for arbitrary date ranges.
So plan is to append data each day to the relevant publish dated tables. Just wondering if there is an easy way to do this or if I need to look though each stats date data for all the publish dates that had hits each day and then append one by one to relevant publish date tables?

Resources