Calculate crash free sessions based on analytics and crashlytics data (BigQuery) - firebase

Assuming the following definitions are in place:
The crash free sessions number is the percentage of sessions in the specified time range not ended by a crash of the application.
The crash free users is the percentage of distinct users who did not experience a crash during the specified time period.
Is it possible to calculate p1 of the above using analytical data exports into BigQuery? Closest thing I was able to find is this ticket on SO BigQuery Crashlytics - Crash free users / sessions but I think what it actually does is calculating p2 and not p1. To rephrase my question, how to identify user sessions and link them with crash experiences if any?

I took some of the information from these BigQuery examples to obtain and aggregate the info to get the overall sessions and the sessions with 'app_exception' events. From this you could calculate the percentage of crash free sessions:
SELECT
SUM(sessions) as sessions,
SUM(app_exception) as session_with_crash,
1 - (SUM(app_exception) / SUM(sessions)) as crash_free_sessions
FROM
(
SELECT
COUNT(user_pseudo_id) as sessions,
SUM(IF (event_name = 'app_exception', 1, 0)) as app_exception,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id
FROM `Firebase_project_name.analytics_property_name.events_*`
-- WHERE event_name = 'app_exception'
GROUP BY ga_session_id
)
This is the result I got:
sessions
sessions_with_crash
crash_free_sessions
282083
94
0.9996667
Keep in mind that in the above query all the data is being queried so make sure to adjust the required timeframe.

Related

BigQuery New Users count strongly differ from displayed Firebase Analytics data

During my inspections, I have found that after the 14th of January, new users count strongly differs between Google BigQuery and Google Firebase Analytics.
The discrepancy is higher than the traditional 0.5-2% rate that can be attributed to the HyperLogLog algorithm used to make computation faster.
I wasn't able to find a precise answer on how exactly new users are computed on Firebase Analytics to create the same query and get identical queries results. Since the discrepancy is above the 30% range, now the problem magnitude is more significant.
Do you have the same problem? How can I explain better this strange behavior? (by run other queries and try to find more details about the issue)
This is the query used to compare results:
SELECT APPROX_COUNT_DISTINCT(user_pseudo_id),event_date FROM `practical-bot-198011.analytics_184597160.events_*`
where event_name = 'first_open' and _TABLE_SUFFIX BETWEEN '20200110' AND '20200127'
GROUP BY event_date
ORDER BY event_date ASC
and this is the result I get:
but in the Google Firebase Analytics Dashboard:
One of the reason of count in Analytics dashboard doesn't match BigQuery results is that the data for the most recent three days is being updated every 4-5 hours in Analytics. In BigQuery data is only exported once per day. Queries which include the most recent three days will show different results between Analytics and BigQuery.
Count(distinct) is an approximation. To get an exact count of unique IDs, try to use EXACT_COUNT_DISTINCT(). Refer to this Stackoverflow thread.
Additionally, take a look to official documentation.

Calculate time spent by users on app using Google BigQuery Firebase analytics data

I am trying to calculate total time spent by users on my app. We have integrated firebase analytics data in BigQuery. Can I use sum to the values of engagement_time_msec in select statement of my query? This is what I am trying :
SELECT SUM(x.value.int_value)
FROM "[dataset]",
UNNEST(event_params) AS x WHERE x.key = "engagement_time_msec"
I am getting very big values after executing this query. I am not sure if is it ok to use SUM("engagement_time_msec") for calculating total time spent by users on app.
Any help would be highly appreciated.
It really depends on what dataset you have. Ideally, you would want login and logout timestamps if you have. Take the time_diff between the values, grouping by user, device, loadsequence etc. Anything which defines a single event

Discrepancy in Daily user engagement between Firebase Analytics dashboard and BigQuery

I'm trying to develop a query against Firebase Analytics data linked to BigQuery to reproduce the "Daily user engagement" graph from the Firebase Analytics dashboard (to include in a Google Data Studio report).
According to Firebase Help documentation, Daily user engagement is defined as "Average daily engagement per user for the date range, including the fluctuation by percentage from the previous date range." So, my attempt is to sum the engagement_time_msec (the additional engagement time (ms) since the last user_engagement event according to https://support.google.com/firebase/answer/7061705?hl=en) for user_engagement events, divided by the count of users (identified by user_dim.app_info.app_instance_id) per day. The query looks like this:
SELECT ((total_engagement_time_msec / 1000) / users) as average_engagement_time_sec, date FROM
(SELECT
SUM(params.value.int_value) as total_engagement_time_msec,
COUNT(DISTINCT(user_dim.app_info.app_instance_id)) as users,
e.date
FROM `com_artermobilize_alertable_IOS.app_events_*`, UNNEST(event_dim) as e, UNNEST(e.params) as params
WHERE e.name = 'user_engagement'
AND params.key = 'engagement_time_msec'
GROUP BY e.date)
ORDER BY date desc
The results are close to what's displayed in the Firebase console graph of Daily user engagement, but the values from my query are consistently a few seconds higher (BigQuery results shown here on the left, Firebase Console graph values on the right).
To note, we're not setting user_dim.user_id and not using IDFA, so my understanding is the correct/only way to count "users" is the user_dim.app_info.app_instance_id, and I imagine the same would be true for the Firebase console.
Can anyone suggest what might be different between how I'm determining the average engagement time from BigQuery, and how that's being determined in the Firebase console graph?
To note, I've seen a similar question posed here, but I don't believe the suggested answer applies for my query since 1) the discrepancies are present over multiple days, 2) I'm already querying for user_engagement events and 3) the event date being used in the query is stated to be based on the registered timezone of your app (according to this).

Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export

According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)
I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results
Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?
My query is
SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29'))
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27'
AND event_dim.name ='session_start'
Thanks
Update
After #djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).
I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.
The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?
The current query is:
SELECT
COUNT(*) AS active_users
FROM (
SELECT
COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
FROM
TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
WHERE
DATE(event_dim.timestamp_micros) = '2016-11-25'
AND event_dim.name ='user_engagement'
GROUP BY
user_id )
Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem
You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.
Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.
Also FB reports take a bit to process so you wight want and check the FB reports the next day.
FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.
Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.
For Android stats I used:
user_dim.device_info.resettable_device_id
instead of
user_dim.app_info.app_instance_id
and it produced better results.

Hits Processed Per Month?

If you refer to http://www.google.com/intl/en_uk/analytics/premium/features.html, you will notice that Standard allows for 10 million hits processed per month and Premium allows for 1 billion.
I have a website on an account, with multiple "folders" for different sub-domains, and also different "Views" or dashboards for some of these sub-domains.
The website I am on recently lost tracking for conversion rates, and everything has plummeted to near 0%, which is an incorrect statistic. I am curious as to how I can figure up if this account is reaching the 10 million limit on the standard version. Or at least how to figure actual hits processed a day, week, or month?
Any ideas?
Thanks!
I don't know how Google enforces hit limits in 2015. However in 2013 a Google representative sent one of our bigger clients a document (answering a question about data limits) that contained the following paragraph:
How do data limits impact sampling? Google Analytics does not sample
your clients data at the point of collection or processing, regardless
of how far they exceed our stated limits. So no hits are discarded.
The only way to sample data at the point of collection is for clients
to use_setSampleRate in their tracking code.
[...]
[...] we reserve the right to shutdown their account [sc. if limits are exceeded], but it won't
happen before we have attempted to contact the account Admins multiple times
and we have exhausted all other options.
Unless Google has changed it's policy in the last 1,5 years I would say not, unprocessed hits are not your problem; it seems Google would have contacted you with an request to limit your hits or upgrade to Analytics Premium before problems occurr.
Plus, since you mentioned that you have several views - views do not count towards your quota (they display the same data in different ways). However properties (I think that is what you mean by "folders") do.
Updated 2017: It seems that Google intends to enforce limits more strictly. One of my clients now has the following warning in his GA interface:
Your data volume (XXX hits) exceeds the limit of 10M hit per month as
outlined in our terms of service. If you continue to exceed the limit
you will lose access to future data.
You can create a database table, like this:
visits(
id bigint primary key auto_increment,
ip text,
visit_date timestamp default current_timestamp
)
Upon each page visit, you can insert a record into the table. Later you can view statistics. For instance, visit count in a given day would look like:
select id, ip, visit_date
from visits
where visit_date >= '2015-07-21 00:00:00' and visit_date < '2015-07-22 00:00:00'

Resources