I'm trying to develop a query against Firebase Analytics data linked to BigQuery to reproduce the "Daily user engagement" graph from the Firebase Analytics dashboard (to include in a Google Data Studio report).
According to Firebase Help documentation, Daily user engagement is defined as "Average daily engagement per user for the date range, including the fluctuation by percentage from the previous date range." So, my attempt is to sum the engagement_time_msec (the additional engagement time (ms) since the last user_engagement event according to https://support.google.com/firebase/answer/7061705?hl=en) for user_engagement events, divided by the count of users (identified by user_dim.app_info.app_instance_id) per day. The query looks like this:
SELECT ((total_engagement_time_msec / 1000) / users) as average_engagement_time_sec, date FROM
(SELECT
SUM(params.value.int_value) as total_engagement_time_msec,
COUNT(DISTINCT(user_dim.app_info.app_instance_id)) as users,
e.date
FROM `com_artermobilize_alertable_IOS.app_events_*`, UNNEST(event_dim) as e, UNNEST(e.params) as params
WHERE e.name = 'user_engagement'
AND params.key = 'engagement_time_msec'
GROUP BY e.date)
ORDER BY date desc
The results are close to what's displayed in the Firebase console graph of Daily user engagement, but the values from my query are consistently a few seconds higher (BigQuery results shown here on the left, Firebase Console graph values on the right).
To note, we're not setting user_dim.user_id and not using IDFA, so my understanding is the correct/only way to count "users" is the user_dim.app_info.app_instance_id, and I imagine the same would be true for the Firebase console.
Can anyone suggest what might be different between how I'm determining the average engagement time from BigQuery, and how that's being determined in the Firebase console graph?
To note, I've seen a similar question posed here, but I don't believe the suggested answer applies for my query since 1) the discrepancies are present over multiple days, 2) I'm already querying for user_engagement events and 3) the event date being used in the query is stated to be based on the registered timezone of your app (according to this).
Related
I use BigQuery to get data from Firebase, and I'm having some issues with data accuracy. Figures I get from the data in BigQuery are not matching the ones on Firebase dashboard.
I recently understood that Firebase does not look at all users, instead it considers only active users. So now I have filtered for active users by putting a filter for engagement_time_msec>0. My active users number is matching up with Firebase dashboard now (just 1-2 digits difference occassionally).
But my main problem is with the average engagement time!
Firebase (and GA for Firebase) shows average engagement time metric under engagement overview. When you hover over it, it gives this definition.
"Average Engagement Time per active user for the time period selected"
However, when I get data through BigQuery and calculate this manually, my numbers are off.
I am calculating Active Users as Distinctcount of user_psuedo_ID where engagement time>0, and engagement time is being summed up where event name = user_engagement. (I have converted engagement time msec to mins)
Average engagement time = SUM(Engagement time mins)/Active Users
This should give me an average engagement time per active user, but this figure doesn't match the one in Firebase console. I have tried so many methods, and I fail to understand what Firebase is doing at the back end to come up with these values.
P.s: I have also tried summing up engagement time without a condition on event name and that gives me an even greater average, making the difference between it and Firebase even bigger.
Please help!!
During my inspections, I have found that after the 14th of January, new users count strongly differs between Google BigQuery and Google Firebase Analytics.
The discrepancy is higher than the traditional 0.5-2% rate that can be attributed to the HyperLogLog algorithm used to make computation faster.
I wasn't able to find a precise answer on how exactly new users are computed on Firebase Analytics to create the same query and get identical queries results. Since the discrepancy is above the 30% range, now the problem magnitude is more significant.
Do you have the same problem? How can I explain better this strange behavior? (by run other queries and try to find more details about the issue)
This is the query used to compare results:
SELECT APPROX_COUNT_DISTINCT(user_pseudo_id),event_date FROM `practical-bot-198011.analytics_184597160.events_*`
where event_name = 'first_open' and _TABLE_SUFFIX BETWEEN '20200110' AND '20200127'
GROUP BY event_date
ORDER BY event_date ASC
and this is the result I get:
but in the Google Firebase Analytics Dashboard:
One of the reason of count in Analytics dashboard doesn't match BigQuery results is that the data for the most recent three days is being updated every 4-5 hours in Analytics. In BigQuery data is only exported once per day. Queries which include the most recent three days will show different results between Analytics and BigQuery.
Count(distinct) is an approximation. To get an exact count of unique IDs, try to use EXACT_COUNT_DISTINCT(). Refer to this Stackoverflow thread.
Additionally, take a look to official documentation.
I have recently linked a Firebase project to BigQuery, the project contains both an iOS app and an Android app.
I need to run a query to export the count of the "session_start" event grouped by platform.
My query looks like this:
SELECT
event_date,
platform,
count(case when event_name = 'session_start' then 1 else null end) as app_sessions
from
`xxx.analytics_xxx.events_*`
WHERE
_table_suffix BETWEEN "20191028" AND FORMAT_DATE('%Y%m%d', date_sub(current_date(), INTERVAL 1 DAY))
GROUP BY
event_date, platform
Order by
event_date
I found differences between the query results and the Firebase console, the discrepancy does not occur for all the dates so I'm wondering why this is happening.
Am I querying the count of session_start in the wrong way?
Update:
The discrepancy is around 1% and the numbers I got from the query are greater than the ones I see in the console (I attached a table with some data from one of the platforms for clarification).
I read the post
Discrepancies on “active users metric” between Firebase Analytics dashboard and BigQuery export
especially the part regarding the needed time for data to be fully uploaded. In my case I noticed the discrepancy for dates older than three days even if the difference is very little (I ran the query on the 14th).
I can live with these variances, since I'm new to BigQuery I would like to know whether I'm querying data in the right way or not.
Indeed, I don't know if I should expect exactly the same numbers from BigQuery and the Firebase console or data from the two sources can be very close but small differences may occur.
Thank you
I am trying to calculate total time spent by users on my app. We have integrated firebase analytics data in BigQuery. Can I use sum to the values of engagement_time_msec in select statement of my query? This is what I am trying :
SELECT SUM(x.value.int_value)
FROM "[dataset]",
UNNEST(event_params) AS x WHERE x.key = "engagement_time_msec"
I am getting very big values after executing this query. I am not sure if is it ok to use SUM("engagement_time_msec") for calculating total time spent by users on app.
Any help would be highly appreciated.
It really depends on what dataset you have. Ideally, you would want login and logout timestamps if you have. Take the time_diff between the values, grouping by user, device, loadsequence etc. Anything which defines a single event
According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)
I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results
Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?
My query is
SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29'))
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27'
AND event_dim.name ='session_start'
Thanks
Update
After #djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).
I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.
The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?
The current query is:
SELECT
COUNT(*) AS active_users
FROM (
SELECT
COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
FROM
TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
WHERE
DATE(event_dim.timestamp_micros) = '2016-11-25'
AND event_dim.name ='user_engagement'
GROUP BY
user_id )
Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem
You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.
Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.
Also FB reports take a bit to process so you wight want and check the FB reports the next day.
FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.
Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.
For Android stats I used:
user_dim.device_info.resettable_device_id
instead of
user_dim.app_info.app_instance_id
and it produced better results.