GA in bigquery does not show the correct result - google-analytics

I'm trying to get GA data in Bigquery by the query below.
GA data by products, but somehow it returns no result.
Anyone knows why?
SELECT
product.productSKU,
COUNT(DISTINCT fullVisitorId) AS unique_user,
COUNT(fullVisitorId) AS page_view,
COUNT(DISTINCT visitId) AS unique_session
FROM
`.ga_sessions_*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) AS product
WHERE
hits.type = 'PAGE'
GROUP BY
product.productSKU

Related

How to replicate google analytics segment in google BigQuery?

Does one of you replicate google analytics segment & specially advanced segments with sequences conditions within Google BigQuery?
I mean for instance:
users who had for instance a clientid which is either 1234555.6666 or 1234665.6644 or 12345855.6256 ? and page ABC and after followed by page YYYVV
I did find it right now within the questions on stack?
Example of advanced segment to create in bigqueryenter image description here[enter image description here]1 enter image description here enter image description here
Here's an example of my query
#standard sql
select
MAX (hits.hitNumber) as max_hitnumber,
fullvisitorid AS fullvisitorid,
visitid AS visitid,
clientId as clientId,
date as date,
hits.hour AS hit_hour,
hits.minute AS hit_minute,
hits.isExit AS hit_isExit,
hits.type AS hit_type,
(SELECT MAX(IF(index=4, value, NULL)) FROM UNNEST(hits.customDimensions)) AS type_de_la_page,
(SELECT MAX(IF(index=5, value, NULL)) FROM UNNEST(hits.customDimensions)) AS titre_de_la_page,
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS univers_affichage
FROM `dataset.ga_sessions_*`, UNNEST(hits) as hits
WHERE
_TABLE_SUFFIX BETWEEN 'XX' AND 'YY'
AND
clientId IN ('1216107517.1566907413','2133916944.1568016396','191276059.1563523657','2099333168.1568989130','40560158.1568714387','1148175150.1568740892','2006464416.1569333485')
AND
((SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = "prod")
AND
hits.type="PAGE"
AND
(SELECT MAX(IF(index=4, value, NULL)) FROM UNNEST(hits.customDimensions)) ="home")
GROUP BY
fullvisitorid,
visitid,
clientId,
date,
hit_hour,
hit_minute,
hit_type,
type_de_la_page,
titre_de_la_page,
univers_affichage,
hit_isExit
But don't know to add some regex conditions in my query for my google analytics custom dimension + also i don't know how to deal in google bigquery for adding in my query the element like in my google analytics segment is followed by.

How to query Direct returning visitor in BigQuery

I am trying to figure out how many users returned as Direct users after visiting the website as Organic using BigQuery
This is what I did so far. In order to get the number of users who came back as Direct after visiting as Organic, I used
organic_user.visitNumber < direct_user.visitNumber
in WHERE clause.
SELECT
organic_user.date,
COUNT (DISTINCT direct_user.fullVisitorId) AS return_direct_user
FROM
(
SELECT
date,
fullVisitorId,
visitNumber
FROM
`ga_sessions_*`,
UNNEST(hits) as hits
WHERE
DATE BETWEEN '20190814'
AND '20190911'
AND channelGrouping = 'Organic Search'
) AS organic_user
INNER JOIN (
SELECT
date,
fullVisitorId,
visitNumber
FROM
`ga_sessions_*`,
UNNEST(hits) as hits
WHERE
DATE BETWEEN '20190814'
AND '20190911'
AND channelGrouping = 'Direct'
) AS direct_user ON organic_user.fullVisitorId = direct_user.fullVisitorId
WHERE
organic_user.visitNumber < direct_user.visitNumber
GROUP BY
date
ORDER BY
date ASC
Could anyone verify this query is correct?
If not, could you provide a solution for this?
With all the clarifications you provided in the comments, I was able to come up with some adaptations of your original query:
SELECT
direct_user.date,
COUNT (DISTINCT direct_user.fullVisitorId) AS return_direct_user
FROM (
SELECT
date,
fullVisitorId,
visitNumber
FROM
`bigquery-public-data`.google_analytics_sample.`ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
DATE BETWEEN '20161214'
AND '20180911'
AND channelGrouping = 'Organic Search' ) AS organic_user
INNER JOIN (
SELECT
date,
fullVisitorId,
visitNumber
FROM
`bigquery-public-data`.google_analytics_sample.`ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
DATE BETWEEN '20161214'
AND '20180911'
AND channelGrouping = 'Direct' ) AS direct_user
ON
organic_user.fullVisitorId = direct_user.fullVisitorId
AND organic_user.visitNumber < direct_user.visitNumber
GROUP BY
direct_user.date
ORDER BY
direct_user.date ASC
Here are some considerations about the changes I made:
I noticed it was important to specify the subquery group date we
are using for the group by. Since we are counting ‘Direct’ visits
per day, it makes sense we count when they happen.
I moved the organic_user.visitNumber < direct_user.visitNumber
condition to the JOIN clause, I know for INNER JOINs it does not
make any technical difference, but for semantic reasons I thought it
belong there.
I hope this information results to be helpful to you.

Query for pvs of a page daily - Bigquery - google analytics

I have connected analytics with bigquery. What query should i type if i want to get daily pageviews (for each page), from the beginning of the year?
page pv date
/mysite1 5 01-01-2017
/mysite 2 01-01-2017
and so on? Could you help with this?
These queries match the results in my GA account, I welcome any feedback for a better approach!
Standard SQL:
SELECT
date,
hits.page.pagePath AS pagePath,
COUNT(*) AS pageviews
FROM
`ga_sessions_*`, --update
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN '20170101' --start date
AND '20170101' --end date
AND hits.type = 'PAGE'
GROUP BY
date,
pagePath
ORDER BY
date ASC,
pageviews DESC
Legacy SQL:
SELECT
date,
hits.page.pagePath AS pagePath,
COUNT(*) AS pageviews
FROM
TABLE_DATE_RANGE([ga_sessions_], -- update
TIMESTAMP('2017-01-01'), -- start date
TIMESTAMP('2017-01-01') -- end date
)
WHERE
hits.type = 'PAGE'
GROUP BY
date,
pagePath
ORDER BY
date ASC,
pageviews DESC

Joining to landing pages query doubles the sessions per source

I'm trying to query sum of visits per source from a Big Query table of Google Analytics data, but will need to filter some sessions out at landing page level. Hence I'm pre-querying visitIDs by landing page and re-joining to session data like so:
#StandardSQL
WITH landingpages AS (
SELECT
visitID,
h.page.pagePath AS LandingPage
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) AS h
WHERE
hitNumber = 1
AND
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
# filters to be added here
)
SELECT
sessions.trafficSource.source,
SUM(sessions.totals.visits) AS visits
FROM `project.dataset.ga_sessions_*` AS sessions
JOIN
landingpages
ON
landingpages.visitID = sessions.visitID
WHERE
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
GROUP BY
trafficSource.source
ORDER BY
visits DESC
This roughly doubles the number of sessions per each source as reported from GA.
Can anyone point out what I've done wrong? (I suspect it is blindingly obvious)
I've tried examining the data output from the first query and can't find anything wrong with it aside from a very small proportion of duplicated visitIDs. I've also tried various different types of JOIN, all to now avail.
When querying ga data from GBQ it's imperative to know and keep in mind that a unique visit is represented by both a fullVisitorID and visitID. Only a double join on both will return a meaningful data set.
Here's what I should have written:
#StandardSQL
WITH landingpages AS (
SELECT
fullVisitorId,
visitID,
h.page.pagePath AS LandingPage
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) AS h
WHERE
hitNumber = 1
AND
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
),
session_data AS (
SELECT
date AS ga_date, trafficSource.source AS source, fullVisitorId, visitID, SUM(totals.visits) AS visits
FROM
`project.dataset.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
AND
totals.visits > 0
GROUP BY ga_date, source, fullVisitorId, visitID
)
SELECT
ga_date, source, SUM(visits) AS Sessions
FROM
landingpages
JOIN
session_data
ON
landingpages.VisitID = session_data.VisitID
AND
landingpages.fullVisitorId = session_data.fullVisitorId
GROUP BY
ga_date, source
ORDER BY
Sessions DESC

How to get the Google Analytics definition of unique page views in Bigquery

https://support.google.com/analytics/answer/1257084?hl=en-GB#pageviews_vs_unique_views
I'm trying to calculate the sum of unique page views per day which Google analytics has on its interface
How do I get the equivalent using bigquery?
There are two ways how this is used:
1) One is as the original linked documentation says, to combine full visitor user id, and their different session id: visitId, and count those.
SELECT
EXACT_COUNT_DISTINCT(combinedVisitorId)
FROM (
SELECT
CONCAT(fullVisitorId,string(VisitId)) AS combinedVisitorId
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE' )
2) The other is just counting distinct fullVisitorIds
SELECT
EXACT_COUNT_DISTINCT(fullVisitorId)
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE'
If someone wants to try out this on a sample public dataset there is a tutorial how to add the sample dataset.
The other queries didn't match the Unique Pageviews metric in my Google Analytics account, but the following did:
SELECT COUNT(1) as unique_pageviews
FROM (
SELECT
hits.page.pagePath,
hits.page.pageTitle,
fullVisitorId,
visitNumber,
COUNT(1) as hits
FROM [my_table]
WHERE hits.type='PAGE'
GROUP BY
hits.page.pagePath,
hits.page.pageTitle,
fullVisitorId,
visitNumber
)
For uniquePageViews you better want to use something like this:
SELECT
date,
SUM(uniquePageviews) AS uniquePageviews
FROM (
SELECT
date,
CONCAT(fullVisitorId,string(VisitId)) AS combinedVisitorId,
EXACT_COUNT_DISTINCT(hits.page.pagePath) AS uniquePageviews
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE'
GROUP BY 1,2)
GROUP EACH BY 1;
So, in 2022 EXACT_COUNT_DISTINCT() seems to be deprecated..
Also for me the following combination of fullvisitorid+visitNumber+visitStartTime+hits.page.pagePath was always more precise than the above solutions:
SELECT
SUM(Unique_PageViews)
FROM
(SELECT
COUNT(DISTINCT(CONCAT(fullvisitorid,"-",CAST(visitNumber AS string),"-",CAST(visitStartTime AS string),"-",hits.page.pagePath))) as Unique_PageViews
FROM
`mhsd-bigquery-project.8330566.ga_sessions_*`,
unnest(hits) as hits
WHERE
_table_suffix BETWEEN '20220307'
AND '20220313'
AND hits.type = 'PAGE')

Resources