How to replicate google analytics segment in google BigQuery? - google-analytics

Does one of you replicate google analytics segment & specially advanced segments with sequences conditions within Google BigQuery?
I mean for instance:
users who had for instance a clientid which is either 1234555.6666 or 1234665.6644 or 12345855.6256 ? and page ABC and after followed by page YYYVV
I did find it right now within the questions on stack?
Example of advanced segment to create in bigqueryenter image description here[enter image description here]1 enter image description here enter image description here
Here's an example of my query
#standard sql
select
MAX (hits.hitNumber) as max_hitnumber,
fullvisitorid AS fullvisitorid,
visitid AS visitid,
clientId as clientId,
date as date,
hits.hour AS hit_hour,
hits.minute AS hit_minute,
hits.isExit AS hit_isExit,
hits.type AS hit_type,
(SELECT MAX(IF(index=4, value, NULL)) FROM UNNEST(hits.customDimensions)) AS type_de_la_page,
(SELECT MAX(IF(index=5, value, NULL)) FROM UNNEST(hits.customDimensions)) AS titre_de_la_page,
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS univers_affichage
FROM `dataset.ga_sessions_*`, UNNEST(hits) as hits
WHERE
_TABLE_SUFFIX BETWEEN 'XX' AND 'YY'
AND
clientId IN ('1216107517.1566907413','2133916944.1568016396','191276059.1563523657','2099333168.1568989130','40560158.1568714387','1148175150.1568740892','2006464416.1569333485')
AND
((SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = "prod")
AND
hits.type="PAGE"
AND
(SELECT MAX(IF(index=4, value, NULL)) FROM UNNEST(hits.customDimensions)) ="home")
GROUP BY
fullvisitorid,
visitid,
clientId,
date,
hit_hour,
hit_minute,
hit_type,
type_de_la_page,
titre_de_la_page,
univers_affichage,
hit_isExit
But don't know to add some regex conditions in my query for my google analytics custom dimension + also i don't know how to deal in google bigquery for adding in my query the element like in my google analytics segment is followed by.

Related

BigQuery problem - I can't extract quantity and products added to cart for product lists (Google Analytics - UA)

Good night,
I am trying to create a query on BigQuery which include the following dimensions: Date, ProductListName, ProductSKU, ProductListPosition and the following metrics:Product List Views, Product List Clicks, Quantity and Number units added to cart.
Nevertheless, Quantity and Units added to cart are not working as expected. Both always show the same result (0). I have already check with Google Analytics the correct results so I know the figure I would have got if the query was correct.
Below these lines, the query I did
Could anyone please help me with that?
Thanks in advance
SELECT
PARSE_DATE("%Y%m%d",date) AS Fecha,
product.productListName AS Lista_Producto,
product.productSKU AS SKU,
product.productListPosition AS Posicion_En_Lista,
SUM(IF(product.isImpression = true,1,0)) AS Vistas_Producto,
SUM(IF(product.isClick = true,1,0)) AS Clics_Producto,
SUM(IF(hits.eCommerceAction.action_type = "3",1,0)) AS AddToCart,
SUM(IF(hits.eCommerceAction.action_type = "6",1,0)) AS Cantidad_Comprada
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
,UNNEST(hits) hits
,UNNEST(hits.product) product
WHERE _TABLE_SUFFIX BETWEEN "20170730" AND "20170731"
--AND product.productSKU = "GGOEYFKQ020699" AND product.productListName = "Category" AND product.productListPosition = 1
AND product.productListName != "(not set)"
GROUP BY Fecha, SKU, Lista_Producto, Posicion_En_Lista
ORDER BY Fecha DESC;
Try use below as add_to_cart:
CASE
WHEN LEAD(productListName) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber) = "(not set)" THEN LEAD(product_add_to_cart) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber)+product_add_to_cart
ELSE
product_add_to_cart
END

GA in bigquery does not show the correct result

I'm trying to get GA data in Bigquery by the query below.
GA data by products, but somehow it returns no result.
Anyone knows why?
SELECT
product.productSKU,
COUNT(DISTINCT fullVisitorId) AS unique_user,
COUNT(fullVisitorId) AS page_view,
COUNT(DISTINCT visitId) AS unique_session
FROM
`.ga_sessions_*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) AS product
WHERE
hits.type = 'PAGE'
GROUP BY
product.productSKU

Counting google analytics unique events in BigQuery

I have managed to calculate total events by ISOweek but not unique events for a given Google Analytics Event using BigQuery. When checking GA, total_events matches the GA interface on the dot but unique_events are off. Do you know how I can solve this?
The query:
SELECT INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(fullVisitorId) AS unique_events
FROM
TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
Depending on the scope you need to count(distinct ) different things, but you always need to fulfill these conditions:
unique events refer to the combination of category, action and label
make sure eventAction is not NULL
make sure eventLabel is not NULL
eventCategory is allowed be NULL
I'm using COALESCE() to avoid NULLs
Example Session Scope
SELECT
SUM( (SELECT COUNT(h.eventInfo.eventCategory) FROM t.hits h) ) events,
SUM( (SELECT COUNT(DISTINCT
CONCAT( h.eventInfo.eventCategory,
COALESCE(h.eventinfo.eventaction,''),
COALESCE(h.eventinfo.eventlabel, ''))
)
FROM
t.hits h ) ) uniqueEvents
FROM
`google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t
Example Hit Scope
SELECT
h.eventInfo.eventCategory,
COUNT(1) events,
-- we need to take sessions into account, so we add fullvisitorid and visitstarttime
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string),
COALESCE(h.eventinfo.eventaction,''),
COALESCE(h.eventinfo.eventlabel, ''))) uniqueEvents
FROM
`google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t,
t.hits h
WHERE
h.type='EVENT'
GROUP BY
1
ORDER BY
2 DESC
hth!
The definition of unique events in Google Analytics is:
A count of the number of times an event with the category/action/label
value was seen at least once within a session.
In other words, the number of sessions in which a specific event (defined by category, action AND label) was sent. In your query, you count the number of unique visitors that had the event, while you need to count the number of sessions and keep in mind that events with different labels should be counted as different unique events (although we are only interested in category and action).
A possible way to fix your code is:
SELECT
INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(CONCAT(fullVisitorId,'-',string(visitId),'-',date,'-',ifnull(hits.eventInfo.eventLabel,'null'))) AS unique_events
FROM
TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
The results of this query should match with the data in the GA interface.
I believe the issue is that you are only counting the number of unique visitors have completed the specified action, while GA defines unique events as "The number of times during a date range that a session contained the specific dimension".
Therefore, I would just change your code to the below:
SELECT INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(CONCAT(fullVisitorId, STRING(visitId))) AS unique_events
FROM
TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
This should give you the distinct count of sessions that had the given events.
We did something similar to what #Martin was suggesting with some cool CTEs and we were able to get an 100% match on what was coming out of Google Analytics from BigQuery.
Checkout the code snippet below that returns a per day sum of sessions + unique Add to Cart events:
#standardSQL
WITH AN_ATC AS
(
SELECT
-- full date w/ hyphens (ie 2021-01-07)
CAST(format_date('%Y-%m-%d', parse_date("%Y%m%d", date)) AS DATE) as DATE,
-- COUNT OF SESSIONS
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))) AS Sessions,
-- COUNT OF UNIQUE EVENTS PER SESSION
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string),
COALESCE(hits.eventinfo.eventaction,''),
COALESCE(hits.eventinfo.eventlabel, ''))) AS EVENTS
FROM `an-big-query.PROJECT_ID.ga_sessions_*` ,
UNNEST(hits) as hits
WHERE
-- start date
_table_suffix BETWEEN '20190101'
-- yesterday
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(),INTERVAL 1 DAY))
AND hits.eventInfo.eventAction = 'add to cart'
GROUP BY
date
)
SELECT
DATE,
SESSIONS,
EVENTS
FROM AN_ATC
ORDER BY date DESC
Where,
SESSIONS = Google Analytics ga:Sessions
and
EVENTS = Google Analytics ga:uniqueEvents
BOTH with eventAction=#add to cart
Hope that helps everyone that was searching/googling!

Joining to landing pages query doubles the sessions per source

I'm trying to query sum of visits per source from a Big Query table of Google Analytics data, but will need to filter some sessions out at landing page level. Hence I'm pre-querying visitIDs by landing page and re-joining to session data like so:
#StandardSQL
WITH landingpages AS (
SELECT
visitID,
h.page.pagePath AS LandingPage
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) AS h
WHERE
hitNumber = 1
AND
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
# filters to be added here
)
SELECT
sessions.trafficSource.source,
SUM(sessions.totals.visits) AS visits
FROM `project.dataset.ga_sessions_*` AS sessions
JOIN
landingpages
ON
landingpages.visitID = sessions.visitID
WHERE
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
GROUP BY
trafficSource.source
ORDER BY
visits DESC
This roughly doubles the number of sessions per each source as reported from GA.
Can anyone point out what I've done wrong? (I suspect it is blindingly obvious)
I've tried examining the data output from the first query and can't find anything wrong with it aside from a very small proportion of duplicated visitIDs. I've also tried various different types of JOIN, all to now avail.
When querying ga data from GBQ it's imperative to know and keep in mind that a unique visit is represented by both a fullVisitorID and visitID. Only a double join on both will return a meaningful data set.
Here's what I should have written:
#StandardSQL
WITH landingpages AS (
SELECT
fullVisitorId,
visitID,
h.page.pagePath AS LandingPage
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) AS h
WHERE
hitNumber = 1
AND
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
),
session_data AS (
SELECT
date AS ga_date, trafficSource.source AS source, fullVisitorId, visitID, SUM(totals.visits) AS visits
FROM
`project.dataset.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
AND
totals.visits > 0
GROUP BY ga_date, source, fullVisitorId, visitID
)
SELECT
ga_date, source, SUM(visits) AS Sessions
FROM
landingpages
JOIN
session_data
ON
landingpages.VisitID = session_data.VisitID
AND
landingpages.fullVisitorId = session_data.fullVisitorId
GROUP BY
ga_date, source
ORDER BY
Sessions DESC

How to get the Google Analytics definition of unique page views in Bigquery

https://support.google.com/analytics/answer/1257084?hl=en-GB#pageviews_vs_unique_views
I'm trying to calculate the sum of unique page views per day which Google analytics has on its interface
How do I get the equivalent using bigquery?
There are two ways how this is used:
1) One is as the original linked documentation says, to combine full visitor user id, and their different session id: visitId, and count those.
SELECT
EXACT_COUNT_DISTINCT(combinedVisitorId)
FROM (
SELECT
CONCAT(fullVisitorId,string(VisitId)) AS combinedVisitorId
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE' )
2) The other is just counting distinct fullVisitorIds
SELECT
EXACT_COUNT_DISTINCT(fullVisitorId)
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE'
If someone wants to try out this on a sample public dataset there is a tutorial how to add the sample dataset.
The other queries didn't match the Unique Pageviews metric in my Google Analytics account, but the following did:
SELECT COUNT(1) as unique_pageviews
FROM (
SELECT
hits.page.pagePath,
hits.page.pageTitle,
fullVisitorId,
visitNumber,
COUNT(1) as hits
FROM [my_table]
WHERE hits.type='PAGE'
GROUP BY
hits.page.pagePath,
hits.page.pageTitle,
fullVisitorId,
visitNumber
)
For uniquePageViews you better want to use something like this:
SELECT
date,
SUM(uniquePageviews) AS uniquePageviews
FROM (
SELECT
date,
CONCAT(fullVisitorId,string(VisitId)) AS combinedVisitorId,
EXACT_COUNT_DISTINCT(hits.page.pagePath) AS uniquePageviews
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE'
GROUP BY 1,2)
GROUP EACH BY 1;
So, in 2022 EXACT_COUNT_DISTINCT() seems to be deprecated..
Also for me the following combination of fullvisitorid+visitNumber+visitStartTime+hits.page.pagePath was always more precise than the above solutions:
SELECT
SUM(Unique_PageViews)
FROM
(SELECT
COUNT(DISTINCT(CONCAT(fullvisitorid,"-",CAST(visitNumber AS string),"-",CAST(visitStartTime AS string),"-",hits.page.pagePath))) as Unique_PageViews
FROM
`mhsd-bigquery-project.8330566.ga_sessions_*`,
unnest(hits) as hits
WHERE
_table_suffix BETWEEN '20220307'
AND '20220313'
AND hits.type = 'PAGE')

Resources