I am a final year undergraduate student.In my final year research project I am developing a mobile application which can predict our future personal expenses.
First I give the list of tasks with the costs which have spent to particular task.As a example Imagine I want to buy some apple.So I add that task to the app like todo application.Once I finished that task I will add the cost which has spent for buy apple.These data are stored in the google Firebase.
I have a separate ML model which can predict the future expense like how much money will i want in next month for food etc.That ML model's data set is stored in the google drive as a CSV file.That CSV file have not connection with the data that are stored in the Firebase.So
I need to export the data that in the Firebase to csv file which I already created in the google drive.How can I do this.
I also need a proper way to fetch data from model to my mobile application.
technologies that I have used so far are Firebase,Flutter,google colabs
how to model google analytics data in database. I am using pentaho to bring the google analytics data in to my database, but i am not sure how to model the data into tables.
Any reference,suggestion is most welcome.
TIA.
You could be inspired by the schema used by Google Analytics to export 98% of the original hit data + data processing outcomes into Big Query.
The feature is only available for Analytics 360 customers, but the schema is public.
You've got a visualisation here https://storage.googleapis.com/e-nor/visualizations/bigquery/ga360-schema.html
However, it's unlikely that can get export as fine grained data through the export API (you will face usage quota, aggregates,...). But that gives you a starting point.
You can probably challenge the expected usages of that data to focus on data you really need to export and ajust your data model to your use cases.
I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.
Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).
scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.
Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json
I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.
I run a blog and publish AWIN affiliate campaigns on my website.
The Awin affiliate network offers a transaction feed that automatically push near transaction notifications to a URL I am able to define.
Detailed infos: https://wiki.awin.com/index.php/Transaction_Notification
I wonder if there is a way for me to push/import this data from Awin directly into my Google Analytics account and if so how?
Yes, this feature is called Data Import. What you will need is to choose a dimension (called the import key) with which you can blend GA and AWIN data. Practically, what you might need to do is:
Implement a custom dimension as your import key so you can store an affiliate identifier from AWIN against your users in GA, this will be your blending dimension.
Create other custom dimensions/metrics as needed to hold the other data points from AWIN
Create and import data sets on a regular basis to import new AWIN data
Please note that a few weeks ago Google Data Studio now has a feature to blend data sets, so once step 1 is done, you could perform steps 2 and 3 in data studio (which might be easier for you as you don't need to create the extra dimensions/metrics AND you could have your AWIN data in a Google Sheet synced automatically with your GDS report, thus saving you the data imports).
I'm not very familiar with Google Analytics. I'm trying to figure out what the schema would look like for an export that contains all purchases for any users that have visited my site. So to sum it up, I'm interested in understanding what the data structure will look like (column names) for an export from GA that contains all purchases on my site.
Thank you in advance for any insight you may have.
There is no export schema for Google Analytics as such, but the best source for this could be BigQuery Export Schema (BigQuery is included in the Premium/360 version of GA).
https://support.google.com/analytics/answer/3437719?hl=en
You will be able to access it even if you are not GA360 paying customer. The export schema is pretty solid, however keep in mind that there are some vital dimensions that are not available through API/GA interface by default.
Hope this helps.