Accessing Struct(s) and Array(s) in Firebase Closed Funnels through BigQuery - google-analytics

I stumbled unto this standard SQL BigQuery documentation this week, which got me started with a Firebase Analytics Closed Funnel. I however got the wrong results (view image below). There should be no users that had a "Tutorial_LessonCompleted" before they did not start a "Tutorial_LessonStarted >> Lesson = 1 " first. This could be because of various reasons.
Questions:
Is it wise to use the User Property = "first_open_time", or is it better to use the Event = "first_open". How would the latter implementation look like ?
I suspect I am perhaps not correctly drilling down to: Event (String = "Tutorial_LessonStarted") >> parameter (String = "LessonNumber") >> value (String = "lesson1")?
How would a filter on _TABLE_SUFFIX = '20170701' possibly work, I read this will be cheaper. Any optimised code suggestions are received with open arms and an up-vote!
#standardSQL
SELECT
step1, step2, step3, step4, step5, step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_id) AS users
FROM (
SELECT
user_dim.app_info.app_instance_id AS user_id,
event.timestamp_micros AS event_timestamp,
event.name AS step1,
LEAD(event.name, 1) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step2,
LEAD(event.name, 2) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step3,
LEAD(event.name, 3) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step4,
LEAD(event.name, 4) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step5,
LEAD(event.name, 5) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step6
FROM
`......`,
UNNEST(event_dim) AS event,
UNNEST(user_dim.user_properties) AS user_prop
WHERE user_prop.key = "first_open_time"
ORDER BY 1, 2, 3, 4, 5 ASC
)
WHERE step6 = "Tutorial_LessonStarted" AND EXISTS (
SELECT *
FROM `......`,
UNNEST(event_dim) AS event,
UNNEST(event.params)
WHERE key = 'LessonNumber' AND value.string_value = "lesson1") GROUP BY step1, step2, step3, step4, step5, step6
ORDER BY funnel_count DESC
LIMIT 100;
Note:
Enter your query table FROM, i.e:project_id.com_game_example_IOS.app_events_20170212,
I left out the funnel_count and user_count.
Output:
----------------------------------------------------------
Update since original question above:
#Elliot: I don’t understand why you said: -- ensure that an event with lesson1 precedes Tutorial_LessonStarted.
Tutorial_LessonStarted has a parameter "LessonNumber" with values lesson1,lesson2,lesson3,lesson4.
I want to count all funnels that took place with a last step in the funnel equal to LessonNumber=lesson1.
So, applied to event log-data for a brand new user's first session (aka: an user that fired first_open_time), the answer would be the table below:
View.OnboardingWelcomePage
View.OnboardingFinalPage
View.JamLoading
View.JamLoading
Jam.UserViewsJam
Jam.ProjectOpened
View.JamMixer
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson1”)
Jam.ProjectPlayStarted
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
Tutorial.LessonCompleted
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson2”)
So it is important to firstly get all the users that had a first_open_time on a specific day, as well structure the events into a funnel so that the last event in the funnel is one which matches an event and a specific parameter value, and then form the funnel "backwards" from there.

Let me go through some explanation, then see if I can suggest a query to get you started.
It looks like you want to analyze the sequence of events in your analytics data, but the sequence is already there for you--you have an array of the events. Looking at the Firebase schema for BigQuery, event_dim is the relevant column, and unless I'm misunderstanding something, these events are ordered by time. If you want to check what the sixth event's name was, you can use:
event_dim[SAFE_ORDINAL(6)].name
This will evaluate to NULL if there were fewer than six events, or else it will give you the string with the event name.
Another observation is that you are attempting to analyze both event_dim and user_dim, but you are taking the cross product of the two, which will explode the number of rows and make it hard to reason about the results of the query. To look for a specific user property, use an expression of this form:
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Combining these two filters, your FROM and WHERE clause would look something like this:
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Using the bracket operator to access the steps from event_dim, we can do something like this:
WITH FilteredInput AS (
SELECT *
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>' AND
-- ensure that an event with lesson1 precedes Tutorial_LessonStarted
EXISTS (
SELECT 1
FROM UNNEST(event_dim) WITH OFFSET event_offset
CROSS JOIN UNNEST(params)
WHERE key = 'LessonNumber' AND
value.string_value = 'lesson1' AND
event_offset < 5
)
)
SELECT
event_dim[ORDINAL(1)].name AS step1,
event_dim[ORDINAL(2)].name AS step2,
event_dim[ORDINAL(3)].name AS step3,
event_dim[ORDINAL(4)].name AS step4,
event_dim[ORDINAL(5)].name AS step5,
event_dim[ORDINAL(6)].name AS step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_dim.user_id) AS users
FROM FilteredInput
GROUP BY step1, step2, step3, step4, step5, step6;
This will return all unique "paths" along with a count and number of distinct users for each. Note that I'm just writing this off the top of my head--I don't have representative data that I can try it on--so there may be syntax or other errors.

Related

BigQuery - How to order by event

I'm starting using BigQuery these days for work. Until now I managed to request what I wanted but I'm stuck.
I retrieve data from Firebase on my big query console. These data are events from a mobile game we are testing.
I would like to know how many players are there in each level by ABVersion. I can't figure out how to do it.
I did this:
SELECT
param.value.string_value AS Version,
COUNT (DISTINCT user_pseudo_id) AS Players,
param2.value.string_value AS Level
FROM
`*Name of the dataset*`,
UNNEST(event_params) AS param,
UNNEST(event_params) AS param2
WHERE
event_name = 'Level_end'
AND param.key = 'ABVersion'
AND param2.key = 'Level'
GROUP BY Version,Level
And I got this:
I would like to have the number of players per level, with the ABVersion provided.
Thank you for your help!
Level is an integer parameter instead of string. So you should use value.int_value for level.
For the thing you're trying to do, it looks like a better query to me:
SELECT
highest_level,
abversion,
count(*) as players
FROM (
SELECT
user_pseudo_id,
ANY_VALUE((SELECT value.string_value FROM UNNEST(params) WHERE key = 'ABVersion')) as abversion,
MAX((SELECT value.int64_value FROM UNNEST(params) WHERE key = 'Level')) as highest_level
FROM `*Name of the dataset*`,
WHERE
event_name = 'Level_end'
AND EXISTS (SELECT 1 FROM UNNEST(params) WHERE key IN ('Level', 'ABVersion'))
GROUP BY user_pseudo_id
)
GROUP BY 1,2
ORDER BY 1,2

Firebase BigQuery schema migration: Move into a partitioned table?

I got the email with instructions to migrate my previous Firebase tables in BigQuery to the new schema. They point to these instructions:
https://support.google.com/analytics/answer/7029846?#migrationscript
But I'd prefer to:
Instead of running a bash script, I'd rather run only one query that executes the migration.
Instead of creating a number of new tables, I'd rather move all the previous results to a new date partitioned table.
I took the script on the documentation and made some changes.
Look at all the --Fh comments. Those are my modifications.
Choose your destination table.
Choose your date range for Android and IOS.
Note that I'm adding a new column with a real timestamp for partitioning (and your convenience).
Instead of getting a number of new tables, you'll only get one - but partitioned by date.
Modified script:
#standardSQL
CREATE OR REPLACE TABLE `fh-bigquery.deleting.delete`
PARTITION BY DATE(ts)
AS
WITH sources AS ( --Fh
SELECT * FROM (
SELECT *, _table_suffix event_date, 'ANDROID' operating_system
FROM `firebase-public-project.com_firebase_demo_ANDROID.app_events_*`
UNION ALL SELECT *, _table_suffix event_date, 'IOS' operating_system
FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`
)
WHERE event_date BETWEEN '20180503' AND '20180504' --Fh: choose your timerange
)
SELECT
event_date, --Fh: extracted from original table name
TIMESTAMP_MICROS(event.timestamp_micros) ts, --Fh: adding a real timestamp column
event.timestamp_micros AS event_timestamp,
event.previous_timestamp_micros AS event_previous_timestamp,
event.name AS event_name,
event.value_in_usd AS event_value_in_usd,
user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
(
SELECT
ARRAY_AGG(STRUCT(event_param.key AS key,
STRUCT(event_param.value.string_value AS string_value,
event_param.value.int_value AS int_value,
event_param.value.double_value AS double_value,
event_param.value.float_value AS float_value) AS value))
FROM
UNNEST(event.params) AS event_param) AS event_params,
user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
user_dim.user_id AS user_id,
user_dim.app_info.app_instance_id AS user_pseudo_id,
"" AS stream_id,
user_dim.app_info.app_platform AS platform,
STRUCT( user_dim.ltv_info.revenue AS revenue,
user_dim.ltv_info.currency AS currency ) AS user_ltv,
STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
user_dim.traffic_source.user_acquired_medium AS medium,
user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
STRUCT( user_dim.geo_info.continent AS continent,
user_dim.geo_info.country AS country,
user_dim.geo_info.region AS region,
user_dim.geo_info.city AS city ) AS geo,
STRUCT( user_dim.device_info.device_category AS category,
user_dim.device_info.mobile_brand_name,
user_dim.device_info.mobile_model_name,
user_dim.device_info.mobile_marketing_name,
user_dim.device_info.device_model AS mobile_os_hardware_model,
operating_system, --Fh
user_dim.device_info.platform_version AS operating_system_version,
user_dim.device_info.device_id AS vendor_id,
user_dim.device_info.resettable_device_id AS advertising_id,
user_dim.device_info.user_default_language AS language,
user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
STRUCT( user_dim.app_info.app_id AS id,
'app_id' AS firebase_app_id, --Fh: choose your app id
user_dim.app_info.app_version AS version,
user_dim.app_info.app_store AS install_source ) AS app_info,
( SELECT ARRAY_AGG(STRUCT(user_property.key AS key,
STRUCT(user_property.value.value.string_value AS string_value,
user_property.value.value.int_value AS int_value,
user_property.value.value.double_value AS double_value,
user_property.value.value.float_value AS float_value,
user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
FROM UNNEST(user_dim.user_properties) AS user_property
) AS user_properties
FROM sources -- Fh
, UNNEST(event_dim) AS event

SQL Query to find users that install and uninstall an App on the same day

I am trying to find the users that install and uninstall the App on the same day using the data from Firebase Analytics in Google BigQuery
This is where I got so far.
I have a query that gives me users (or app_instance_id) who install or uninstall the App:
SELECT event.date,
user_dim.app_info.app_instance_id,
event.name
FROM `app_name.app_events_20180303`,
UNNEST(event_dim) AS event
WHERE (event.name = "app_remove" OR event.name = "first_open")
ORDER BY app_instance_id, event.date
It gives me the following result where I can see that row 1 and 2 are the same user that installs and uninstalls the App:
I´ve tried to modify the previous query by using
WHERE (event.name = "app_remove" AND event.name = "first_open")
which gives: Query returned zero records.
Do you have any suggestions on how to achieve this? Thanks.
Try this, although I did not test it;
SELECT date,
app_instance_id
FROM
(SELECT event.date,
user_dim.app_info.app_instance_id,
event.name
FROM `app_name.app_events_20180303`,
UNNEST(event_dim) AS event
WHERE (event.name = "app_remove" OR event.name = "first_open"))
GROUP BY app_instance_id, date
HAVING COUNT(*) = 2
ORDER BY app_instance_id, date
To start, it's worth noting that iOS does not yield app_remove, so this query only counts Android users who go through the install/uninstall pattern.
I created a sub-set of users who emitted first_open and app_remove, and counted those entries grouped by the date. I only kept instances where users installed and removed the app the same number of times in a day (greater than zero).
Then I tallied the distinct users.
SELECT COUNT(DISTINCT(user_id)) as transient_user_count
FROM (
SELECT event_date,
user_id,
COUNT(if(event_name = "first_open", user_id, NULL)) as user_first_open,
COUNT(if(event_name = "app_remove", user_id, NULL)) as user_app_remove
FROM `your_app.analytics_123456.events_*`
-- WHERE (_TABLE_SUFFIX between '20191201' and '20191211')
GROUP BY user_id, event_date
HAVING user_first_open > 0 AND user_first_open = user_app_remove
)
If you're not able to rely on user_id, then the documentation suggests that you may be able to rely on the user_pseudo_id
Usually we can join the table by itself to find out such result, something like:
SELECT t1.date, t1.app_instance_id
FROM event as t1, event as t2
WHERE t1.date = t2.date and t1.app_instance_id = t2.app_instance_id and t1.name = "app_remove" and t2.name = "first_open"
ORDER by t1.app_instance_id, t1.date

Counting google analytics unique events in BigQuery

I have managed to calculate total events by ISOweek but not unique events for a given Google Analytics Event using BigQuery. When checking GA, total_events matches the GA interface on the dot but unique_events are off. Do you know how I can solve this?
The query:
SELECT INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(fullVisitorId) AS unique_events
FROM
TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
Depending on the scope you need to count(distinct ) different things, but you always need to fulfill these conditions:
unique events refer to the combination of category, action and label
make sure eventAction is not NULL
make sure eventLabel is not NULL
eventCategory is allowed be NULL
I'm using COALESCE() to avoid NULLs
Example Session Scope
SELECT
SUM( (SELECT COUNT(h.eventInfo.eventCategory) FROM t.hits h) ) events,
SUM( (SELECT COUNT(DISTINCT
CONCAT( h.eventInfo.eventCategory,
COALESCE(h.eventinfo.eventaction,''),
COALESCE(h.eventinfo.eventlabel, ''))
)
FROM
t.hits h ) ) uniqueEvents
FROM
`google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t
Example Hit Scope
SELECT
h.eventInfo.eventCategory,
COUNT(1) events,
-- we need to take sessions into account, so we add fullvisitorid and visitstarttime
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string),
COALESCE(h.eventinfo.eventaction,''),
COALESCE(h.eventinfo.eventlabel, ''))) uniqueEvents
FROM
`google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t,
t.hits h
WHERE
h.type='EVENT'
GROUP BY
1
ORDER BY
2 DESC
hth!
The definition of unique events in Google Analytics is:
A count of the number of times an event with the category/action/label
value was seen at least once within a session.
In other words, the number of sessions in which a specific event (defined by category, action AND label) was sent. In your query, you count the number of unique visitors that had the event, while you need to count the number of sessions and keep in mind that events with different labels should be counted as different unique events (although we are only interested in category and action).
A possible way to fix your code is:
SELECT
INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(CONCAT(fullVisitorId,'-',string(visitId),'-',date,'-',ifnull(hits.eventInfo.eventLabel,'null'))) AS unique_events
FROM
TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
The results of this query should match with the data in the GA interface.
I believe the issue is that you are only counting the number of unique visitors have completed the specified action, while GA defines unique events as "The number of times during a date range that a session contained the specific dimension".
Therefore, I would just change your code to the below:
SELECT INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(CONCAT(fullVisitorId, STRING(visitId))) AS unique_events
FROM
TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
This should give you the distinct count of sessions that had the given events.
We did something similar to what #Martin was suggesting with some cool CTEs and we were able to get an 100% match on what was coming out of Google Analytics from BigQuery.
Checkout the code snippet below that returns a per day sum of sessions + unique Add to Cart events:
#standardSQL
WITH AN_ATC AS
(
SELECT
-- full date w/ hyphens (ie 2021-01-07)
CAST(format_date('%Y-%m-%d', parse_date("%Y%m%d", date)) AS DATE) as DATE,
-- COUNT OF SESSIONS
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))) AS Sessions,
-- COUNT OF UNIQUE EVENTS PER SESSION
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string),
COALESCE(hits.eventinfo.eventaction,''),
COALESCE(hits.eventinfo.eventlabel, ''))) AS EVENTS
FROM `an-big-query.PROJECT_ID.ga_sessions_*` ,
UNNEST(hits) as hits
WHERE
-- start date
_table_suffix BETWEEN '20190101'
-- yesterday
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(),INTERVAL 1 DAY))
AND hits.eventInfo.eventAction = 'add to cart'
GROUP BY
date
)
SELECT
DATE,
SESSIONS,
EVENTS
FROM AN_ATC
ORDER BY date DESC
Where,
SESSIONS = Google Analytics ga:Sessions
and
EVENTS = Google Analytics ga:uniqueEvents
BOTH with eventAction=#add to cart
Hope that helps everyone that was searching/googling!

GA Average Time On Page In BigQuery

I'm having trouble working out average time on page from the back end GA BigQuery export data and wondering if someone might see if code below looks reasonable.
I'm having trouble getting it to match that from query explorer tool.
Is there a way to run query explorer tool for the LondonCycleHelmet data?
Any help much appreciated, thanks
select
pageviews,
exit_pageviews,
sum_hit_length_seconds,
sum_hit_length_seconds / (pageviews - exit_pageviews) as avg_time_on_page
from
(
select
SUM(hit_length_seconds) as sum_hit_length_seconds,
COUNT(IF(hits.type='PAGE',(CONCAT(session_key,'_',hits.page.hostname,'_',hits.page.pagePath)),NULL)) AS pageviews,
COUNT(IF((next_hit_time is null) or (hits.hitNumber=hits_hitNumber_max),(CONCAT(session_key,'_',hits.page.hostname,'_',hits.page.pagePath)),NULL)) AS exit_pageviews,
from
(
select
*,
(next_hit_time-hits.time)/1000 as hit_length_seconds,
from
(
select
fullVisitorId,
visitId,
visitorId,
hits.type,
hits.time,
hits.hitNumber,
hits.page.hostname,
hits.page.pagePath,
-- create some keys to handle data later
concat(fullVisitorId,"_",string(visitId)) as session_key,
concat(fullVisitorId,"_",string(visitId),"_",string(hits.hitNumber),"_",string(hits.time)) as hit_key,
-- get max and min number of hits for each session
MAX(hits.hitNumber) WITHIN RECORD AS hits_hitNumber_max,
MIN(hits.hitNumber) WITHIN RECORD AS hits_hitNumber_min,
-- get min and max hit times to work out full session length
MAX(hits.time) WITHIN RECORD AS hits_time_max,
MIN(hits.time) WITHIN RECORD AS hits_time_min,
-- get next and previous hit time to be able to work out length of each hit
LAG(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitId ORDER BY hits.time ASC) as previous_hit_time,
LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitId ORDER BY hits.time ASC) as next_hit_time,
from
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
)
)
)
UPDATE/CLARIFICATION:
I think it's when i look at an individual page across time that it starts going out of whack.
For example if i run below in BigQuery:
select
pageviews,
exit_pageviews,
sum_hit_length_seconds,
sum_hit_length_seconds / (pageviews - exit_pageviews) as avg_time_on_page
from
(
select
SUM(hit_length_seconds) as sum_hit_length_seconds,
COUNT(IF(hits.type='PAGE',(CONCAT(session_key,'_',hits.page.hostname,'_',hits.page.pagePath)),NULL)) AS pageviews,
COUNT(IF((next_hit_time is null) or (hits.hitNumber=hits_hitNumber_max),(CONCAT(session_key,'_',hits.page.hostname,'_',hits.page.pagePath)),NULL)) AS exit_pageviews,
from
(
select
*,
(next_hit_time-hits.time)/1000 as hit_length_seconds,
from
(
select
fullVisitorId,
visitId,
visitorId,
hits.type,
hits.time,
hits.hitNumber,
hits.page.hostname,
hits.page.pagePath,
-- create some keys to handle data later
concat(fullVisitorId,"_",string(visitId)) as session_key,
concat(fullVisitorId,"_",string(visitId),"_",string(hits.hitNumber),"_",string(hits.time)) as hit_key,
-- get max and min number of hits for each session
MAX(hits.hitNumber) WITHIN RECORD AS hits_hitNumber_max,
MIN(hits.hitNumber) WITHIN RECORD AS hits_hitNumber_min,
-- get min and max hit times to work out full session length
MAX(hits.time) WITHIN RECORD AS hits_time_max,
MIN(hits.time) WITHIN RECORD AS hits_time_min,
-- get next and previous hit time to be able to work out length of each hit
LAG(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitId ORDER BY hits.time ASC) as previous_hit_time,
LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitId ORDER BY hits.time ASC) as next_hit_time,
from
[XXX.ga_sessions_20151001],
[XXX.ga_sessions_20151002],
[XXX.ga_sessions_20151003],
where
hits.page.pagePath='/2015/10/01/blah-blah/'
)
)
)
I get:
[
{
"pageviews": "24002",
"exit_pageviews": "22468",
"sum_hit_length_seconds": "455762.1240000001",
"avg_time_on_page": "297.10699087353333"
}
]
But if i look at query explorer like this:
I get:
So it looks like pageviews match but both exits and time on page seem quite different and i cant figure out why.
Can anyone recreate this example on your own data?
Have a feeling its to do with how exits and time on page are calculate in GA but could not find any examples in BQ GA cookbook of how to calculate time on page or exits.

Resources