I've linked my firebase crashlytics data to bigquery and setup the data studio templates provided by google. A lot of great data in there except the most important metrics required for my dashboard: crash free users and crash free sessions as a percentage.
Nothing stands out in the schema which I could be used to calculate this.
Any ideas how I might get this value? It's displayed in the firebase dashboard so it must be available..
I looked into the documentation and found event_name='app_exception'. With that you can write a query like
WITH userCrashes AS (
SELECT user_pseudo_id, MAX(event_name = 'app_exception') hasCrash
FROM `firebase-public-project.analytics_153293282.events_20181003`
GROUP BY 1
)
SELECT
IF(hasCrash,'crashed','crash-free') crashState,
COUNT(DISTINCT user_pseudo_id) AS users,
ROUND(COUNT(DISTINCT user_pseudo_id) / SUM(COUNT(DISTINCT user_pseudo_id)) OVER (),2) AS userShare
FROM userCrashes
GROUP BY 1
But there is also a flag 'fatal' in the event parameters. In the example data, it's always true, but in case you wanted to take it into respect, you could do something like
WITH userCrashes AS (
SELECT
user_pseudo_id,
MAX(event_name = 'app_exception') hasCrash,
MAX(event_name = 'app_exception'
AND (select value.int_value=1 from unnest(event_params) where key='fatal')
) hasFatalCrash
FROM `firebase-public-project.analytics_153293282.events_20181003`
GROUP BY 1
)
SELECT
IF(hasCrash,'crashed','crash-free') crashState,
IF(hasFatalCrash,'crashed fatal','crash-free') fatalCrashState,
COUNT(DISTINCT user_pseudo_id) AS users,
ROUND(COUNT(DISTINCT user_pseudo_id) / SUM(COUNT(DISTINCT user_pseudo_id)) OVER (),2) AS userShare
FROM userCrashes
GROUP BY 1,2
Disclaimer: I never worked with firebase, so this is all just based on documentation and example data. Hope it helps, though.
Related
We have configured a linking between the GA 4 property and GoogleBigQuery via the GA interface (without any additional code). It works fine, we see a migrated data in GBQ tables, but however, we face an issue with how this data is written in those tables.
If we look at any table we could see that events from different users can be recorded in one session (and there can be different clientIDs (and even usedIDs, which we pass when authorizing a user)) See an example
This is a result of executing following query:
SELECT
event_name,
user_pseudo_id,
user_id,
device.category,
device.mobile_brand_name,
device.mobile_model_name,
device.operating_system_version,
geo.region,
geo.city,
params.key,
params.value.int_value
FROM `%project_name%.analytics_256374149.events_20210331`, unnest(event_params) AS params
WHERE event_name="page_view"
AND params.value.int_value=1617218965
ORDER BY event_timestamp
As a result, you can see that within one session different users from different regions, with different devices and identifiers are combined. It is, of course, impossible to use such data for reporting purposes. Once again, it is a default GA4 → BigQuery setup in the GA4 interface (no add-ons).
We do not understand what the error is (in import, in requests, or somewhere else) and would like to get advice on this issue.
Thanks.
You should look at the combination of user_pseudo_id and the event_param ga_session_id. This combination is unique and used for measuring unique sessions across a property.
For example, this query counts the number of unique event names in each session:
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
COUNT(DISTINCT event_name) AS unique_event_name_count
FROM `<project>.<dataset>.events_*`
GROUP BY user_pseudo_id, ga_session_id
I am using Analytics Events and trying to take advantage of the user data.
I can get pretty much data.
With this Query.
SELECT
*
FROM
`test-project-23471.analytics_205774787.events_20191120`,
UNNEST(event_params) AS event_params
WHERE
event_name ='select_content'
AND event_params.value.string_value = 'a_item_open'
However, I don't need all. So, I did
SELECT
event_params.value.string_value,
event_previous_timestamp,
device,
geo,
app_info
FROM
`test-project-23471.analytics_205774787.events_20191120`,
UNNEST(event_params) AS event_params
WHERE
event_name ='select_content'
AND event_params.value.string_value = 'a_item_open'
And then, I realized that the result doesn't have gender data and age data. And in the document, it says Firebase automatically gets the information. I'd like to combine sex, age(or age group) with the result from the query above.
How can I get it?
Note that this document is just an example on how to query BigTable data by using BigQuery and not from Firebase.
The Firebase layout mentions that it has a RECORD field named "user_properties" which has a "key" STRING field.
Thus, you could try:
SELECT DISTINCT user_properties.key
FROM
`test-project-23471.analytics_205774787.events_20191120`
To retrieve the correct name for the gender/sex property an include it in your query. For instance:
SELECT
event_params.value.string_value,
event_previous_timestamp,
device,
geo,
app_info,
user_properties.value.string_value as gender
FROM
`test-project-23471.analytics_205774787.events_20191120`,
UNNEST(event_params) AS event_params
WHERE
event_name ='select_content'
AND event_params.value.string_value = 'a_item_open'
AND user_properties.key = "Gender"
Nevertheless, if you don't find the Gender info, please consider this. Otherwise, I suggest reaching the Firebase support.
Hope it helps.
For privacy reasons these fields are not available in BigQuery export. You can only see aggregated data for gender and age in Firebase Analytics console.
You can't even use them for targeting in other Firebase features, like RemoteConfig, so user-level granularity is not possible.
I'm trying to calculate average session length using BigQuery for my Firebase + Unity setup.
I followed tutorials for default Unity setup. I can gather data, and see where new sessions begin.
However, I can't seem to find proper session length. I'm able to gather the time between sessions, however I can't seem to find an event which signals sessions expiring (I know they do after 30 minutes of inactivity).
My alternative path has proven a bit difficult...I attempted to get the last interaction event when a session starts, and subtract the event_previous_timestamp from it, no luck because session_start isn't actually the first event thrown when starting a new session!
Here is the query I attempted:
#standardsql
SELECT event_name, session_length, time_between_sessions
FROM
(SELECT user_pseudo_id, event_name, event_timestamp,
event_previous_timestamp,
LAG(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER BY
event_timestamp) AS last_triggered_event,
(LAG(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER
BY event_timestamp) - event_previous_timestamp) / 60000000 AS
session_length,
(event_timestamp - event_previous_timestamp) / 60000000 AS
time_between_sessions
FROM `insertyourtablename`
ORDER BY event_timestamp)
WHERE
event_name = "session_start"
I hope there is an easier way to do this, or I'm close! Thank you :)
We are validating a query in Big Query, and cannot get the results to match with the google analytics UI. A similar question can be found here, but in our case the the mismatch only occurs when we apply a specific filter on ecommerce_action.action_type.
Here is the query:
SELECT COUNT(distinct fullVisitorId+cast(visitid as string)) AS sessions
FROM (
SELECT
device.browserVersion,
geoNetwork.networkLocation,
geoNetwork.networkDomain,
geoNetwork.city,
geoNetwork.country,
geoNetwork.continent,
geoNetwork.region,
device.browserSize,
visitNumber,
trafficSource.source,
trafficSource.medium,
fullvisitorId,
visitId,
device.screenResolution,
device.flashVersion,
device.operatingSystem,
device.browser,
totals.pageviews,
channelGrouping,
totals.transactionRevenue,
totals.timeOnSite,
totals.newVisits,
totals.visits,
date,
hits.eCommerceAction.action_type
FROM
(select *
from TABLE_DATE_RANGE([zzzzzzzzz.ga_sessions_],
<range>) ))t
WHERE
hits.eCommerceAction.action_type = '2' and <stuff to remove bots>
)
From the UI using the built in shopping behavior report, we get 3.836M unique sessions with a product detail view, compared with 3.684M unique sessions in Big Query using the query above.
A few questions:
1) We are under the impression the shopping behavior report "Sessions with Product View" breakdown is based off of the ecommerce_action.actiontype filter. Is that true?
2) Is there a .totals pre-aggregated table that the UI maybe pulling from?
It sounds like the issue is that COUNT(DISTINCT ...) is approximate when using legacy SQL, as noted in the migration guide, so the counts are not accurate. Either use standard SQL instead (preferred) or use EXACT_COUNT_DISTINCT with legacy SQL.
You're including product list views in your query.
As described in https://support.google.com/analytics/answer/3437719 you need to make sure, that no product has isImpression = TRUE because that would mean it is a product list view.
This query sums all sessions which contain any action_type='2' for which all isProduct are null or false:
SELECT
SUM(totals.visits) AS sessions
FROM
`project.123456789.ga_sessions_20180101` AS t
WHERE
(
SELECT
LOGICAL_OR(h.ecommerceaction.action_type='2')
FROM
t.hits AS h
WHERE
(SELECT LOGICAL_AND(isimpression IS NULL OR isimpression = FALSE) FROM h.product))
For legacySQL you can adapt the example in the documentation.
In addition to the fact that COUNT(DISTINCT ...) is approximate when using legacy SQL, there could be sessions in which there are only non-interactive hits, which will not be counted as sessions in the Google Analytics UI but they are counted by both COUNT(DISTINCT ...) and EXACT_COUNT_DISTINCT(...) because in your query they count visit id's.
Using SUM(totals.visits) you should get the same result as in the UI because SUM does not take into account NULL values of totals.visits (corresponding to sessions in which there are only non-interactive hits).
I'm just learning BigQuery so this might be a dumb question, but we want to get some statistics there and one of those is the total sessions in a given day.
To do so, I've queried in BQ:
select sum(sessions) as total_sessions from (
select
fullvisitorid,
count(distinct visitid) as sessions,
from (table_query([40663402], 'timestamp(right(table_id,8)) between timestamp("20150519") and timestamp("20150519")'))
group each by fullvisitorid
)
(I'm using the table_query because later on we might increase the range of days)
This results in 1,075,137.
But in our Google Analytics Reports, in the "Audience Overview" section, the same day results:
This report is based on 1,026,641 sessions (100% of sessions).
There's always this difference of roughly ~5% despite of the day. So I'm wondering, even though the query is quite simple, is there any mistake we've made?
Is this difference expected to happen? I read through BigQuery's documentation but couldn't find anything on this issue.
Thanks in advance,
standardsql
Simply SUM(totals.visits) or when using COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) make sure totals.visits=1!
If you use visitId and you are not grouping per day, you will combine midnight-split-sessions!
Here are all scenarios:
SELECT
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) allSessionsUniquePerDay,
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitId AS STRING) )) allSessionsUniquePerSelectedTimeframe,
sum(totals.visits) interactiveSessionsUniquePerDay, -- equals GA UI sessions
COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) interactiveSessionsUniquePerSelectedTimeframe,
SUM(IF(totals.visits=1,0,1)) nonInteractiveSessions
FROM
`project.dataset.ga_sessions_2017102*`
Wrap up:
fullVisitorId + visitId: useful to reconnect midnight-splits
fullVisitorId + visitStartTime: useful to take splits into account
totals.visits=1 for interaction sessions
fullVisitorId + visitStartTime where totals.visits=1: GA UI sessions (in case you need a session id)
SUM(totals.visits): simple GA UI sessions
fullVisitorId + visitId where totals.visits=1 and GROUP BY date: GA UI sessions with too many chances for errors and misunderstandings
After posting the question we got into contact with Google support and found that in Google Analytics only sessions that had an "event" being fired are actually counted.
In Bigquery you will find all sessions regardless whether they had an interaction or not.
In order to find the same result as in GA, you should filter by sessions with totals.visits = 1 in your BQ query (totals.visits is 1 only for sessions that had an event being fired).
That is:
select sum(sessions) as total_sessions from (
select
fullvisitorid,
count(distinct visitid) as sessions,
from (table_query([40663402], 'timestamp(right(table_id,8)) between timestamp("20150519") and timestamp("20150519")'))
where totals.visits = 1
group each by fullvisitorid
)
The problem could be due to "COUNT DISTINCT".
According to this post:
COUNT DISTINCT is a statistical approximation for all results greater than 1000
You could try setting an additional COUNT parameter to improve accuracy at the expense of performance (see post), but I would first try:
SELECT COUNT( CONCAT( fullvisitorid,'_', STRING(visitid))) as sessions
from (table_query([40663402], 'timestamp(right(table_id,8)) between
timestamp("20150519") and timestamp("20150519")'))
What worked for me was this:
SELECT count(distinct sessionId) FROM(
SELECT CONCAT(clientId, "-", visitNumber, "-", date) as sessionId FROM `project-id.dataset-id.ga_sessions_*`
WHERE _table_suffix BETWEEN "20191001" AND "20191031" AND totals.visits = 1)
The explanation (found very well written in
this article: https://adswerve.com/blog/google-analytics-bigquery-tips-users-sessions-part-one/) is that when counting and dealing with sessions we should be careful because by default, Google Analytics breaks sessions that carryover midnight (time zone of the view). Therefore a same session can end up in two daily tables:
Image from article mentioned above
The code provided creates a sessionID by combining:
client id + visit number + date
while acknowledging the session break; the result will be in a human-readable format. Finally to match sessions in the Google Analytics UI, make sure to filter to only those with totals.visits = 1.