we've had some problems with the tracking of the transactions in Google Analytics in our e-commerce website, and now we've lost a couple of months of data due to a configuration error.
Is it possible to bulk import in some way these transactions, along with their additional parameters (date of event, transaction code, money amount)?
We know when every one of these events has taken place, but we would like to import it back into Analytics to have the complete statistics.
Thank you for the help
If the data is in MySQL say for example.
Run a MySQL command which pulls all the orders from the date_you_want_start_from and then limit it till the date_you_want_to_end_from
With this data, then create a MySQL view call it "import_table"
Create a PHP and XML script which loads all the data from import_table, it then writes and fills the XML feed
Just run that XML feed now inside your analytic loop JavaScript and it will loop and basically add data into the past for you.
Note: PHP, XML and MySQL is just a scenario to give you a head on to do it.
Related
I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.
Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).
scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.
Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json
I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.
I was able to setup the GA360 to bigquery and I also got an email that export is complete. But I dont see anything when I click on the dataset. Will it take time for the tables to show up or do I need to create tables once I setup the dataset in bigquery?
If you successfully connected GA360, you don't need to anything else additional for it to work. Make sure you're using the correct account for the Big Query project.
It will take time for the day-of data to show up as well as a few days for the 13 months of data to show up.
I have implemented google analytics ecommerce tracking in my website. But there was a mistake while passing parameters to google analytics. My order get tracked but product sku code is not set.
Its a dummy order that i dont want show in any google analytics report.
Can you suggest how can i delete this order from google analytics?
I am afraid you cannot remove data from GA once it has been collected.
What you can do is:
hide it: create an Advanced segment, the transaction remains in your GA profile but at least it is not included in the reports.
make a copy: copy the profile and delete the old one (it means you lose historical data)
There is one more option:
1.- You could create a new transaction with the same amount in money, but with a negative sign. For example, if you have recored a transaction for 1,000 dollars, you could recreate it with a "-1000.00" amount. Doing this would "cancell" the wrong transaction.
Important: This will only work when the user sees a long period of time, including the wrong transaction and the fix.
Julien is right. You cannot remove the data.
There're a couple more options in addition to Julien's suggestions though
You can go to "Filters" option of the view and try to see if you can filter it out. Luckily, ecommerce transactions have their own category that can help you narrow down the variable you need to use. (screenshot attached)
Go a little more advanced than filters and use "Data Import" where you import the ecommerce transactions via a spreadsheet thereby overwriting the transactions for that day. So, what you would essentially do is take all the real transactions of ecommerce from your ecommerce application, export them to CSV and then upload it into GA without the test transaction.
Lastly, a tip: create a test profile for things like this.
One of the answers hinted at data imports (but in a way that would probably not have worked). Universal Analytics actually introduced a way to refund transactions (effectively canceling them out) via data imports. However this only works if the data was collected via enhanced e-commerce tracking. As per documentation:
In order to process refunds you need to have collected transaction
data with the ec.js plugin
With standard e-commcerce-tracking Omar Gonzales' answer is still the only working option (I'd like to add the additonal caveat that the negative transaction might be attributed to the wrong channel, so make sure to look at the source/medium/campaign data for the transaction you want to cancel out and supply that data via utm parameters).
This may be a possible duplicate of this question, but according to all the Google Analytics documentation I really should be able to pull my list of custom segments.
Since I have a very large list of them, it would be suboptimal for me to manually copy the segment ids over one at a time.
I'm following this walk through. Steps to reproduce:
Create a custom segment using date of first session in your Google Analytics account.
Authorize the Google Analytics guide to access your Google Analytics account.
Try their on-page query tester, and inspect whether your custom segment is there.
One thing I've already ruled out was the user that created the segment. I've manually created a segment with the same user that I'm querying the API with and it still does not show. Is there a flag I need to set somewhere to include custom segments?
Edit:
It turns out that it will list some custom segments, but not ones created with date of first session, so this is a duplicate of this question, which means that there is a bug in the Google Analytics API.
There was a bug which is now fixed. So it is now possible to list the Date of Session Segments in the Google Analytics Management API by calling the segments.list() method.
So after days of trying to solve this one I've come to the conclusion that it cannot be done as asked.
There is, however, another way to do it. For every segment set up a daily (or weekly, etc) email report to a email as a TSV. In each email body specify the name of the segment so when you're consuming the emails you can know which segment the attached TSV is for. It doesn't look like the daily reports were designed with segments in mind, since non of the metadata included in the TSV mentions which segment it is for.
From there it's trivial. Connect to the email address using an IMAP client once a day and update the numbers.
Note that the daily email only contains the numbers for that day (not a specified range), so you'll need to first generate the report one time with the historical data to load in.
While hacky, one nice thing about this approach is that it keeps your reports in sync with your (faked through email) api code (provided you match the column headings in the TSV). So, if for example, a new filter is included into a report, the new daily fields will continue to update.
Unfortunately though, the past data won't be reflected in the change.
Obviously this isn't great, but if you are monitoring daily cohorts it's the best you've got if you need to stay with Google Analytics. I have raised this as a bug to the Google Analytics developers, but I haven't heard back as to whether or not they plan to fix it.
My goal is to query Analytics, and if there is any new link created last month to some operations and store it in DB. And the problem is that i didn't find a way to query api just for new links in last month! I can compare it to DB query result, but i think it will slow down application, because result from db will have thousands of links, and api query also, comparing them will be inefficient resource usage!
You cant request only new links from the Google Analytics API. I don't see a filter helping here either because it will end up being to big.
My suggestion is you select everything out of Google Analytics store it in your DB. Then at the end of the month you download all pages for that month into your DB. You will be able to search them there and find your new links.
Don't worry about inefficient resource usage. I download millions of rows for customers every month in order to analyze there data further. It's what the API is for.