How to calculate avg time visit per screen_class (firebase) in bigquery? - firebase

I would like to calculate avg time as seen on the screenshot below using bigquery, but I'm not sure how to add the screen class in my query to be able to yield the same result, can you please help me?
My current query only sum up all values in engagement time msec
SELECT SUM(params.value.int_value) as total_engagement_time_msec,
event_date
FROM `datasetid.events_*`, UNNEST(event_params) as params
WHERE event_name = 'user_engagement'
AND params.key = 'engagement_time_msec'
GROUP BY event_date

I'm taking care of this case to provide you an update.
Your table seems to be grouped by screen class, indeed, that's why the avg aggregation is possible.
I'm not familiar with Firebase, but I found out the BigQuery Export schema and Event Parameter Details that your table in the image is probably using, especially the firebase_screen_class (Screen Class) and engagement_time_msec.
So, after checking your question to include Screen Class as a column, you might want to use two tables to group by firebase_screen_class, for example:
#standardSQL
WITH (
SELECT params.key as screen_class, event_name
FROM `datasetid.events_*`, UNNEST(event_params) as params
WHERE params.key = 'firebase_screen_class'
) as sc
SELECT event_date as eventDate, sc.screen_class as screenClass, AVG(engagement_time_msec) as totalEngagementTime
FROM `datasetid.events_*`, UNNEST(event_params) as params
WHERE event_name = 'user_engagement'
AND params.key = 'engagement_time_msec'
INNER JOIN sc
ON sc.event_name==event_name
GROUP BY event_date, screenClass
Note: The query might need some adjustments

Related

BigQuery - How to order by event

I'm starting using BigQuery these days for work. Until now I managed to request what I wanted but I'm stuck.
I retrieve data from Firebase on my big query console. These data are events from a mobile game we are testing.
I would like to know how many players are there in each level by ABVersion. I can't figure out how to do it.
I did this:
SELECT
param.value.string_value AS Version,
COUNT (DISTINCT user_pseudo_id) AS Players,
param2.value.string_value AS Level
FROM
`*Name of the dataset*`,
UNNEST(event_params) AS param,
UNNEST(event_params) AS param2
WHERE
event_name = 'Level_end'
AND param.key = 'ABVersion'
AND param2.key = 'Level'
GROUP BY Version,Level
And I got this:
I would like to have the number of players per level, with the ABVersion provided.
Thank you for your help!
Level is an integer parameter instead of string. So you should use value.int_value for level.
For the thing you're trying to do, it looks like a better query to me:
SELECT
highest_level,
abversion,
count(*) as players
FROM (
SELECT
user_pseudo_id,
ANY_VALUE((SELECT value.string_value FROM UNNEST(params) WHERE key = 'ABVersion')) as abversion,
MAX((SELECT value.int64_value FROM UNNEST(params) WHERE key = 'Level')) as highest_level
FROM `*Name of the dataset*`,
WHERE
event_name = 'Level_end'
AND EXISTS (SELECT 1 FROM UNNEST(params) WHERE key IN ('Level', 'ABVersion'))
GROUP BY user_pseudo_id
)
GROUP BY 1,2
ORDER BY 1,2

Display Ad type per level?

I'm starting to use BigQuery for work so I'm very new to it, and I'm struggling with a request.
So I request data from a mobile game from Firebase. I would like to get the number of types of ads watched per level. For instance:
Type of ads (Inter/Rewarded) - Number of ads watched - level number
I started with this:
SELECT
param.value.string_value AS Type_of_ads,
COUNT(param.value.string_value) AS Nber
FROM
`*Name of the project**`,
UNNEST(event_params) AS param
WHERE
event_name = 'Fullscreen_displayed'
AND param.key = 'Ad_type'
AND user_first_touch_timestamp > 1560587334000000 #15/06/2019
GROUP BY
param.value.string_value
ORDER BY
param.value.string_value
With this, I only have the number of ads and ad_type in total. I would like to have per level. So I did this:
SELECT
param.value.string_value AS Type_of_ads,
COUNT(param.value.string_value) AS Nber,
app_info.version AS Version,
COUNT (event_name) AS Runs
FROM
`Name of the project*`,
UNNEST(event_params) AS param,
UNNEST(event_params) AS param2
WHERE
event_name = 'Fullscreen_displayed'
AND param.key = 'Ad_type'
AND event_name = 'Level_end'
AND param2.key = 'Level'
GROUP BY
Type_of_ads, Version
ORDER BY
Type_of_ads, Version, Runs
But I have the "This query didn't return any result". I can't figure out how to fix it. Could you help me on this matter please? Thank you very much for your help!

How to be sure about All events have 'Sesssion Info' like ga_session_id, ga_session_number IN NEW PROPERTY APP+WEB for GA

i'm try to verify whether All events have 'session info' in new property App+Web using BigQuery.
here is the sample data schema of my table.
event_params.key got ga_session_id
than i tried this query.
#standardSQL
SELECT
event_name, COUNT(event_name) as count_event_name
FROM
`mytable`,
UNNEST(event_params) AS params
WHERE params.key = "ga_session_id"
in this query, I got 24,473,721 rows in total, which seems to have "ga_session_id"
but, because the mytable have 24,753,258 rows, so there are at lease 279,537 rows which have no "ga_session_id".
So i want to know which event_name have no "ga_session_id", and how many of it.
Any possible codes? please help :'(
ADD)
Adding '!' to WHERE is not a solution(i've tried)
Because UNNESTing adds an additional rows. it results more than 189 million rows which exceed original table row.
#standardSQL
SELECT
event_name, COUNT(event_name) as count_event_name
FROM
`mytable`,
UNNEST(event_params) AS params
WHERE params.key != "ga_session_id"
thanks
As you say, using UNNEST generates a lot of rows. This is because for each original row (same event_name), you have one row generated per each event_params "subrow".
When you do the unnesting, those 24,753,258 rows are unnested into a lot more (of the order of 200 million).
From those, 24473721 meet the condition params.key = "ga_session_id", and about 189 million don't (that's why they appear in the != clause).
What you have to keep in mind is that for a same event (which is identified with a timestamp and name), when you apply the unnest operator lots of rows are generated, so with your query you are counting each event more than once.
Having said that, if what you want to do is to know how many events contain the "ga_session_id", you should do a query like this
#standardSQL
SELECT
event_name, COUNT(DISTINCT event_timestamp) as number_of_each_event_name
FROM
`mytable`,
UNNEST(event_params) AS params
WHERE params.key = "ga_session_id"
GROUP BY event_name
And if you want to do the contrary, you can apply the != condition
If you want to get the total number of events that meet the condition, without splitting them according to the event_name, your query is this one:
#standardSQL
SELECT
COUNT(DISTINCT event_timestamp) as total_number_of_events
FROM
`mytable`,
UNNEST(event_params) AS params
WHERE params.key = "ga_session_id"
The result of this query, added to the result of the same query with the != condition, should be now 24,753,258, which was the original number of rows (events) you had in your table.
I hope this works for you!

SQL Query to find users that install and uninstall an App on the same day

I am trying to find the users that install and uninstall the App on the same day using the data from Firebase Analytics in Google BigQuery
This is where I got so far.
I have a query that gives me users (or app_instance_id) who install or uninstall the App:
SELECT event.date,
user_dim.app_info.app_instance_id,
event.name
FROM `app_name.app_events_20180303`,
UNNEST(event_dim) AS event
WHERE (event.name = "app_remove" OR event.name = "first_open")
ORDER BY app_instance_id, event.date
It gives me the following result where I can see that row 1 and 2 are the same user that installs and uninstalls the App:
I´ve tried to modify the previous query by using
WHERE (event.name = "app_remove" AND event.name = "first_open")
which gives: Query returned zero records.
Do you have any suggestions on how to achieve this? Thanks.
Try this, although I did not test it;
SELECT date,
app_instance_id
FROM
(SELECT event.date,
user_dim.app_info.app_instance_id,
event.name
FROM `app_name.app_events_20180303`,
UNNEST(event_dim) AS event
WHERE (event.name = "app_remove" OR event.name = "first_open"))
GROUP BY app_instance_id, date
HAVING COUNT(*) = 2
ORDER BY app_instance_id, date
To start, it's worth noting that iOS does not yield app_remove, so this query only counts Android users who go through the install/uninstall pattern.
I created a sub-set of users who emitted first_open and app_remove, and counted those entries grouped by the date. I only kept instances where users installed and removed the app the same number of times in a day (greater than zero).
Then I tallied the distinct users.
SELECT COUNT(DISTINCT(user_id)) as transient_user_count
FROM (
SELECT event_date,
user_id,
COUNT(if(event_name = "first_open", user_id, NULL)) as user_first_open,
COUNT(if(event_name = "app_remove", user_id, NULL)) as user_app_remove
FROM `your_app.analytics_123456.events_*`
-- WHERE (_TABLE_SUFFIX between '20191201' and '20191211')
GROUP BY user_id, event_date
HAVING user_first_open > 0 AND user_first_open = user_app_remove
)
If you're not able to rely on user_id, then the documentation suggests that you may be able to rely on the user_pseudo_id
Usually we can join the table by itself to find out such result, something like:
SELECT t1.date, t1.app_instance_id
FROM event as t1, event as t2
WHERE t1.date = t2.date and t1.app_instance_id = t2.app_instance_id and t1.name = "app_remove" and t2.name = "first_open"
ORDER by t1.app_instance_id, t1.date

Accessing Struct(s) and Array(s) in Firebase Closed Funnels through BigQuery

I stumbled unto this standard SQL BigQuery documentation this week, which got me started with a Firebase Analytics Closed Funnel. I however got the wrong results (view image below). There should be no users that had a "Tutorial_LessonCompleted" before they did not start a "Tutorial_LessonStarted >> Lesson = 1 " first. This could be because of various reasons.
Questions:
Is it wise to use the User Property = "first_open_time", or is it better to use the Event = "first_open". How would the latter implementation look like ?
I suspect I am perhaps not correctly drilling down to: Event (String = "Tutorial_LessonStarted") >> parameter (String = "LessonNumber") >> value (String = "lesson1")?
How would a filter on _TABLE_SUFFIX = '20170701' possibly work, I read this will be cheaper. Any optimised code suggestions are received with open arms and an up-vote!
#standardSQL
SELECT
step1, step2, step3, step4, step5, step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_id) AS users
FROM (
SELECT
user_dim.app_info.app_instance_id AS user_id,
event.timestamp_micros AS event_timestamp,
event.name AS step1,
LEAD(event.name, 1) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step2,
LEAD(event.name, 2) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step3,
LEAD(event.name, 3) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step4,
LEAD(event.name, 4) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step5,
LEAD(event.name, 5) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step6
FROM
`......`,
UNNEST(event_dim) AS event,
UNNEST(user_dim.user_properties) AS user_prop
WHERE user_prop.key = "first_open_time"
ORDER BY 1, 2, 3, 4, 5 ASC
)
WHERE step6 = "Tutorial_LessonStarted" AND EXISTS (
SELECT *
FROM `......`,
UNNEST(event_dim) AS event,
UNNEST(event.params)
WHERE key = 'LessonNumber' AND value.string_value = "lesson1") GROUP BY step1, step2, step3, step4, step5, step6
ORDER BY funnel_count DESC
LIMIT 100;
Note:
Enter your query table FROM, i.e:project_id.com_game_example_IOS.app_events_20170212,
I left out the funnel_count and user_count.
Output:
----------------------------------------------------------
Update since original question above:
#Elliot: I don’t understand why you said: -- ensure that an event with lesson1 precedes Tutorial_LessonStarted.
Tutorial_LessonStarted has a parameter "LessonNumber" with values lesson1,lesson2,lesson3,lesson4.
I want to count all funnels that took place with a last step in the funnel equal to LessonNumber=lesson1.
So, applied to event log-data for a brand new user's first session (aka: an user that fired first_open_time), the answer would be the table below:
View.OnboardingWelcomePage
View.OnboardingFinalPage
View.JamLoading
View.JamLoading
Jam.UserViewsJam
Jam.ProjectOpened
View.JamMixer
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson1”)
Jam.ProjectPlayStarted
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
Tutorial.LessonCompleted
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson2”)
So it is important to firstly get all the users that had a first_open_time on a specific day, as well structure the events into a funnel so that the last event in the funnel is one which matches an event and a specific parameter value, and then form the funnel "backwards" from there.
Let me go through some explanation, then see if I can suggest a query to get you started.
It looks like you want to analyze the sequence of events in your analytics data, but the sequence is already there for you--you have an array of the events. Looking at the Firebase schema for BigQuery, event_dim is the relevant column, and unless I'm misunderstanding something, these events are ordered by time. If you want to check what the sixth event's name was, you can use:
event_dim[SAFE_ORDINAL(6)].name
This will evaluate to NULL if there were fewer than six events, or else it will give you the string with the event name.
Another observation is that you are attempting to analyze both event_dim and user_dim, but you are taking the cross product of the two, which will explode the number of rows and make it hard to reason about the results of the query. To look for a specific user property, use an expression of this form:
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Combining these two filters, your FROM and WHERE clause would look something like this:
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Using the bracket operator to access the steps from event_dim, we can do something like this:
WITH FilteredInput AS (
SELECT *
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>' AND
-- ensure that an event with lesson1 precedes Tutorial_LessonStarted
EXISTS (
SELECT 1
FROM UNNEST(event_dim) WITH OFFSET event_offset
CROSS JOIN UNNEST(params)
WHERE key = 'LessonNumber' AND
value.string_value = 'lesson1' AND
event_offset < 5
)
)
SELECT
event_dim[ORDINAL(1)].name AS step1,
event_dim[ORDINAL(2)].name AS step2,
event_dim[ORDINAL(3)].name AS step3,
event_dim[ORDINAL(4)].name AS step4,
event_dim[ORDINAL(5)].name AS step5,
event_dim[ORDINAL(6)].name AS step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_dim.user_id) AS users
FROM FilteredInput
GROUP BY step1, step2, step3, step4, step5, step6;
This will return all unique "paths" along with a count and number of distinct users for each. Note that I'm just writing this off the top of my head--I don't have representative data that I can try it on--so there may be syntax or other errors.

Resources