Firebase + BigQuery - Uniquely Identifying Devices

Firebase + BigQuery - Uniquely Identifying Devices - firebase

Recently started exploring the Firebase data via the Data Studio Firebase connector. I'm doing some custom reports based on the user_engagement event to compare with data we previously reported on in Flurry.
When looking at some DAU figures they are pretty close but on MAU they tend to get inflated. (Saw this behavior first on the Firebase Events Report Template). Digging into it a little more we do have a pattern where users frequently reinstall the app which generates a new app_instance_id. So as I fallback I'm using the resettable_device_id but then there's the situation advertising tracking is disabled on device resulting in a zeroed value. (Or for a brief period in January nulled out values, not sure if this was client or part of the Firebase link)
Currently thinking something roughly following the logic below, falling back to app_instance_id if the advertising identifier was not set. What approaches would be worth looking into to have a reliable user identifier for metrics reporting? (In future will be calling the setUserID to utilize our own identifier but looking to match up historical data)
IF(user_dim.device_info.resettable_device_id is not null,
IF(user_dim.device_info.resettable_device_id = '00000000-0000-0000-0000-000000000000', user_dim.app_info.app_instance_id, user_dim.device_info.resettable_device_id),
user_dim.app_info.app_instance_id
) as unique_user_identifier,
Thanks in advance.

Simpler way to deal with the cases where a resettable_device_id is not available:
IF(user_dim.device_info.limited_ad_tracking, user_dim.app_info.app_instance_id, user_dim.device_info.resettable_device_id) as unique_user_identifier

Related

Getting programmatic access to Firebase Analytics trending events

I have a mobile app which plays audio tracks. It uses Firebase Analytics to record events such as 'track names' played. Within the Firebase 'StreamView' one can access trending events and see which are the most popular tracks being played at any given moment. I would like to gain access to this list to and use it within my app to display a list of "tracks being played now".
I've looked into gaining access to Analytic trending event data and think Firebase Cloud Functions may provide a method of extracting the information I need. However, I'm not certain this is the correct, or easiest, method.
Could someone let me know whether extracting trending events is possible and, if so, point me in the correct direction?
Thanks

EDIT - Actually, there is a much better and easier way to get access to real-time events that have occurred in your app over the last 30 minutes. You can do so using the Google Analytics Data API.
Using the API you can filter through the event data for the past 30 mins, and inspect relevant custom dimensions on the play_track event for the track that was played (or provide a custom dimension filter to further specify the event data you get back).
This would be the ideal way to achieve what you're looking for. You might still want to use Cloud Firestore if you'd like to keep a longer record of trending tracks being played (e.g. in the last hour, last 24 hours, etc... though). Also note that the API is still in alpha.
-- END OF EDIT
Other Solutions
Option 1 - Use Cloud Firestore
This is probably the easiest solution - you can create a record of which tracks are being played whenever the event occurs by creating a simple collection in Cloud Firestore, and updating records for tracks being played there. It would require additional effort in logging and retrieving which tracks are played beyond just using Google Analytics, but should be straightforward to implement.
Note you'll probably want to check out the Firestore pricing guide first before selecting this option.
Option 2 - Using Firebase Cloud Functions
You can trigger a Cloud Function each time a play_track event is logged. The event will need to be marked as a conversion event in order for it to trigger a Cloud Function, and within the Cloud Function you can access the event parameters to identify which track is being played, and over time maintain a record somewhere for which tracks are being played to determine the most trending tracks. To maintain state you could use something like Firestore to keep track of which "tracks" are being played at the moment.
A couple of caveats about this approach:
You'll want to check out the Cloud Functions for Firebase pricing guide to make sure it falls within an acceptable range for your needs.
Cloud Functions triggers for analytics events currently only works for Android and iOS apps (no support for web apps).
Google Analytics triggers for Cloud Functions is currently in beta.
Option 3 - Using BigQuery for your analytics data
This option requires a bit more effort to setup, but you can export your Google Analytics data to BigQuery, and query the generated intraday tables to see which tracks are trending as well as a lot more additional insights.
The caveat with this approach are that you'll also need to check the pricing guide for using BigQuery to make sure it falls within your needs, and you'll need to make a call to execute the query and retrieve the list of tracks (or get a cached result).

What the best clockify API endpoint to get time entries of (grouped by) saved reports?

Asking here, after asking to Clockify support.
Trying to extend some of clockify capabilities to create extra reporting for our clients,
I’ve been playing with your API and specifically: the enpoint /reports/{reportsId}
• My goal:
Get all the time entries of a specific "saved report” (usually saved by our Project Managers)
• What I EXPECT from "/reports/{reportsId}”:
To get all the info and entities (users, time entries, projects, etc.) only regarding that particular reportId
• What I GET from "/reports/{reportsId}”:
Lots of info regarding the whole workspace, and I only see summaryReport
as more “specific to the saved report itself”...
• Questions:
Is this the correct behavior?
How do you filter down time entries of specific reports in URLs like https://clockify.me/bookmarks/BOOKMARK_HASH_HERE ?
Do you only call "/reports/{reportsId}” and filter down on client-side? (it seems to me that way, exploring the Network tab)
If that’s the way, what’s the point of calling the report endpoint? Only for the summaryReport object?
3- Is "/reports/{reportsId}” the best endpoint I can use to reach my goal? …or which way would you recommend me?

summaryReport.timeEntries will contain all the individual time entries from that particular report. Each entry has a user, project, client, time etc. Grouping by project is done on the client.
I'm not sure I fully understand your specific problem though. Are you suggesting the entries you get from the report endpoint do not belong to the given report?

Ingesting Google Analytics data into S3 or Redshift

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.

Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).

scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.

Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json

I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.

Google Analytics API returning no data before 23/09/16

I have a Google Analytics API request that provides full data for any date after 23/09/16, but nothing before.
The metrics/dimensions in use are:
date
ga:sessions
ga:users
ga:deviceCategory
ga:sourceMedium
ga:campaign
I have created a custom report with the same dimensions/metrics in the web view, and I can confirm that the data does exist there (and is being provided). If I take out deviceCategory (or only have deviceCategory) then results are returned. This suggests to me that before this date, deviceCategory and sourceMedium/campaign were an invalid combination, but I can't find anything in the release notes to suggest this was changed and checking previous versions of the dimension explorer using archive.org does not indicate this was the case either.
I have raised a support request with Analytics support but they have said they don't have a team for API related issues.
Any help would be greatly appreciated. We already have a lot of reporting built around this combination and would like to be able to compare historical data.
Edit: I think this has something to do with the data retention settings in Analytics (which default to 26 months). Not sure why this particular combination would be affected as there is no user or event data required here.

You are right, it's because of data retention settings.
"The retention period applies to user-level and event-level data associated with cookies, user-identifiers <...>"
https://support.google.com/analytics/answer/7667196?hl=en
deviceCategory is associated with ClientID.

enabling hourly data in google analytics

I have two view/profiles linked to my google analytics account. I want to fetch the hourly data for the current day, ie
start date:today
end date: today
with a few filters and dimensions.
Now I am getting the response for one view that means it is possible in google analytics, however for the other view its showing all the values as 0- this applies both to the gui and the api.
Can anyone suggest me how to enable it for the other view as well?

You cannot. Google Analytics needs some processing time. It might be that some data appears immediately, especially on small accounts, but it's not guaranteed and not a thing you can "enable" or count on.
Updated: Okay, that was a dumb answer. Still, there is a processing latency event in GA Premium. It is possible to get realtime data, but that's a different API with limited data (the core reporting API might return data, but no guarantees for that).
But I admit, since your problem is that you do not get data for the whole day yor have a different problem. But with a premium account you should be able to contact your account manager/technical support.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex