I'm trying to calculate average session length using BigQuery for my Firebase + Unity setup.
I followed tutorials for default Unity setup. I can gather data, and see where new sessions begin.
However, I can't seem to find proper session length. I'm able to gather the time between sessions, however I can't seem to find an event which signals sessions expiring (I know they do after 30 minutes of inactivity).
My alternative path has proven a bit difficult...I attempted to get the last interaction event when a session starts, and subtract the event_previous_timestamp from it, no luck because session_start isn't actually the first event thrown when starting a new session!
Here is the query I attempted:
#standardsql
SELECT event_name, session_length, time_between_sessions
FROM
(SELECT user_pseudo_id, event_name, event_timestamp,
event_previous_timestamp,
LAG(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER BY
event_timestamp) AS last_triggered_event,
(LAG(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER
BY event_timestamp) - event_previous_timestamp) / 60000000 AS
session_length,
(event_timestamp - event_previous_timestamp) / 60000000 AS
time_between_sessions
FROM `insertyourtablename`
ORDER BY event_timestamp)
WHERE
event_name = "session_start"
I hope there is an easier way to do this, or I'm close! Thank you :)
Related
In Version 8.12.1 of the Firebase Apple SDK an issue with session_start events being logged during app prewarming on iOS 15+ which inserts additional 'session_start' events. I've noticed that as a result of additional session start rows which inserts additional 'ga_session_id' values into the BigQuery table.
ga_session_id is a unique session identifier associated with each event that occurs within a session and is thus created when this additional session_start fires when the app_warming occurs - using the session_number field and calculating session length it's possible to remove sessions with just one session_start and a small session length but this does not seem to reduce the overall count of sessions by much.
This has impacted the reported number of sessions when querying the BigQuery table when counting distinct user_psuedo_id||ga_session_id.
Is there a way to isolate these sessions in a separate table or constrict them from the query using an additional clause in said query to remove the sessions which are not truly sessions.
https://github.com/firebase/firebase-ios-sdk/issues/6161
https://firebase.google.com/support/release-notes/ios
A simplified version of said query I'm using:
with windowTemp as
(
select
PARSE_DATE("%Y%m%d",event_date) as date_formatted,
event_name,
user_pseudo_id,
(select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id
from
`firebase-XXXX.analytics_XXX.events_*`
where
_table_suffix between '20210201' and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
group by
1,2,3,4
)
SELECT
date_formatted,
Count(DISTINCT user_pseudo_id) AS users,
Count(DISTINCT Concat(user_pseudo_id,session_id)) AS sessions,
FROM
windowTemp
GROUP by 1
ORDER BY 1
I am trying to calculate the total time spent by users on my app. We have integrated firebase analytics data in BigQuery. Can I use the sum of the values of engagement_time_msec/1000 in the select statement of my query? This is what I am trying :
SELECT SUM(x.value.int_value) FROM "[dataset]", UNNEST(event_params) AS x WHERE x.key = "engagement_time_msec"
I am getting very big values after executing this query(it giving huge hours per day). I am not sure if is it ok to use SUM("engagement_time_msec") for calculating the total time spent by users on the app.
I am not expecting that users are spending this much time on the app. Is it the right way to calculate engagement_time, or which is the best event to calculate the engagement_time?
Any help would be highly appreciated.
As per google analytics docs in regards to engagement_time_sec, this field is defined as "The additional engagement time (ms) since the last user_engagement event". Therefore, if you only look at this, you are losing all the previous time spent by users before the mentioned user_engagement event is triggered.
What I'd do, since now ga_session_id is defined, would be to grab the maximum and minimum for each ga_session_id timestamp, use the TIMESTAMP_DIFF() function for each case, and sum the results of all the sessions for a given day:
WITH ga_sessions AS (
SELECT
event_timestamp,
event_date,
params.value.int_value AS ga_session_id
FROM
`analytics_123456789.events_*`, UNNEST(event_params) AS params
WHERE
params.key = "ga_session_id"
),
session_length AS (
SELECT
event_date,
TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration_seconds
FROM
ga_sessions
WHERE
ga_session_id IS NOT NULL
GROUP BY
1
),
final AS (
SELECT
event_date,
SUM(session_duration_seconds) as total_seconds_in_app
FROM
session_length
GROUP BY
1
ORDER BY
1 DESC
)
SELECT * FROM final
OUTPUT (data extracted from the app I work at):
event_date | total_seconds_in_app
-----------+--------------------
20210920 | 45600
20210919 | 43576
20210918 | 44539
There seems to be 1-2% of duplicates in the Firebase analytics events exported to Big Query. What are the best practices to remove these?
Atm the client does not send a counter with the events (per session). This would provide an unambiguous way of removing duplicate events, so I recommend Firebase implementing that. However, at the moment, what would be a good way to remove the duplicates? Look at client user_pseudo_id, event_timestamp, and event_name - fields and remove all except one with same triple?
How does event_bundle_sequence_id -field work? Will duplicates have the same value in this field, or different? That is, are duplicate events sent within the same bundle, or in different bundles?
Is Firebase planning to remove these duplicates earlier in the processing, either for Firebase analytics itself, or in the export to Big Query?
Standard SQL to check for duplicates in one days events:
with n_dups as
(
SELECT event_name, event_timestamp, user_pseudo_id, count(1)-1 as n_duplicates
FROM `project.dataset.events_20190610`
group by event_name, event_timestamp, user_pseudo_id
)
select n_duplicates, count(1) as n_cases
from n_dups
group by n_duplicates
order by n_cases desc
We use the QUALIFY clause for deduplication Firebase events in BigQuery:
SELECT
*
FROM
`project.dataset.events_*`
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id,
event_name,
event_timestamp,
TO_JSON_STRING(event_params)
) = 1
Qualifying columns:
- name: user_pseudo_id
description: Autogenerated pseudonymous ID for the user -
Unique identifier for a specific installation of application on a client device,
e.g. "938642951.1666427135".
All events generated by that device will be tagged with this pseudonymous ID,
so that you can relate events from the same user together.
- name: event_name
description: Event name, e.g. "app_launch", "session_start", "login", "logout" etc.
- name: event_timestamp
description: The time (in microseconds, UTC) at which the event was logged on the client,
e.g. "1666529002225262".
- name: event_params
description: A repeated record (ARRAY) of the parameters associated with this event.
I've integrated my Firebase project with BigQuery. Now I'm facing a data discrepancy issue while trying to get 1 day active users, for the selected date i.e. 20190210, with following query from BigQuery;
SELECT COUNT(DISTINCT user_pseudo_id) AS 1_day_active_users_count
FROM `MY_TABLE.events_*`
WHERE event_name = 'user_engagement' AND _TABLE_SUFFIX = '20190210'
But the figures returned from BigQuery doesn't match with the ones reported on Firebase Analytics Dashboard for the same date. Any clue what's possibly going wrong here?
The following sample query mentioned my Firebase Team, here https://support.google.com/firebase/answer/9037342?hl=en&ref_topic=7029512, is not so helpful as its taking into consideration the current time and getting users accordingly.
N-day active users
/**
* Builds an audience of N-Day Active Users.
*
* N-day active users = users who have logged at least one user_engagement
* event in the last N days.
*/
SELECT
COUNT(DISTINCT user_id) AS n_day_active_users_count
FROM
-- PLEASE REPLACE WITH YOUR TABLE NAME.
`YOUR_TABLE.events_*`
WHERE
event_name = 'user_engagement'
-- Pick events in the last N = 20 days.
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 20 DAY))
-- PLEASE REPLACE WITH YOUR DESIRED DATE RANGE.
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131';
So given the small discrepancy here, I believe the issue is one of timezones.
When you're looking at a "day" in the Firebase Console, you're looking at the time interval from midnight to midnight in whatever time zone you've specified when you first set up your project. When you're looking at a "day" in BigQuery, you're looking at the time interval from midnight to midnight in UTC.
If you want to make sure you're looking at the events that match up with what's in your console, you should query the event_timestamp value in your BigQuery table (and remember that it might span multiple tables) to match up with what's in your timezone.
I've linked my firebase crashlytics data to bigquery and setup the data studio templates provided by google. A lot of great data in there except the most important metrics required for my dashboard: crash free users and crash free sessions as a percentage.
Nothing stands out in the schema which I could be used to calculate this.
Any ideas how I might get this value? It's displayed in the firebase dashboard so it must be available..
I looked into the documentation and found event_name='app_exception'. With that you can write a query like
WITH userCrashes AS (
SELECT user_pseudo_id, MAX(event_name = 'app_exception') hasCrash
FROM `firebase-public-project.analytics_153293282.events_20181003`
GROUP BY 1
)
SELECT
IF(hasCrash,'crashed','crash-free') crashState,
COUNT(DISTINCT user_pseudo_id) AS users,
ROUND(COUNT(DISTINCT user_pseudo_id) / SUM(COUNT(DISTINCT user_pseudo_id)) OVER (),2) AS userShare
FROM userCrashes
GROUP BY 1
But there is also a flag 'fatal' in the event parameters. In the example data, it's always true, but in case you wanted to take it into respect, you could do something like
WITH userCrashes AS (
SELECT
user_pseudo_id,
MAX(event_name = 'app_exception') hasCrash,
MAX(event_name = 'app_exception'
AND (select value.int_value=1 from unnest(event_params) where key='fatal')
) hasFatalCrash
FROM `firebase-public-project.analytics_153293282.events_20181003`
GROUP BY 1
)
SELECT
IF(hasCrash,'crashed','crash-free') crashState,
IF(hasFatalCrash,'crashed fatal','crash-free') fatalCrashState,
COUNT(DISTINCT user_pseudo_id) AS users,
ROUND(COUNT(DISTINCT user_pseudo_id) / SUM(COUNT(DISTINCT user_pseudo_id)) OVER (),2) AS userShare
FROM userCrashes
GROUP BY 1,2
Disclaimer: I never worked with firebase, so this is all just based on documentation and example data. Hope it helps, though.