BigQuery GA360 export - product query - google-analytics

I'm working through the documentation with BigQuery, and I'm stuck on a particular query.
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM [XXXXXXX.ga_sessions_20171101]
WHERE fullVisitorId IN (
SELECT fullVisitorId
FROM [XXXXXXXX.ga_sessions_20171101]
WHERE hits.item.productName CONTAINS 'Product A'
AND totals.transactions>=1
GROUP BY fullVisitorId )
AND hits.item.productName IS NOT NULL
AND hits.item.productName !='Product A'
GROUP BY other_purchased_products
ORDER BY quantity DESC;
This one works fine for an individual day in the dataset, however If I want to use TABLE DATE RANGE it errors.
To use the above query across multiple dates, how do I adjust this?
Thanks

With legacy SQL the only way is to use an INNER JOIN on fullvisitorid

Related

Combine user level and page in Big Query

I started exploring big query, i am wondering, is it possible to combine in big query or GA number of unique users and pages that they have seen?
So i want to see how many are there Y unique visitors who viewed one or more pages and of these, Z% also viewed W pages?
I used below query to get Y unique visitors who viewed certain pages, but not able to see the % who have viewed W pages.
#standardSQL
SELECT
hits.page.pagePath AS other_seen_pages,
COUNT(hits.page.pagePath) AS number_other_seen_pages
FROM `project.dataset.session`,UNNEST(hits) AS hits
WHERE fullVisitorId IN (
SELECT fullVisitorId
FROM `project.dataset.session`,UNNEST(hits) AS hits
WHERE hits.page.pagePath LIKE '%x_page%'
GROUP BY fullVisitorId )
AND hits.page.pagePath IS NOT NULL
AND hits.page.pagePath NOT LIKE '%x_page%'
GROUP BY other_seen_pages
ORDER BY number_other_seen_pages DESC;
I understand that you would like a query where, on top on the other pages that the same visitors visited, the number of visitors (from the same subset of visitors) that visited them (and the percentage above the total amount of users) appears.
Here is some code that worked for me with the bigquery-public-data.google_analytics_sample.ga_sessions_20170801 Google Analytics public table and the '/google+redesign/electronics' pagePath:
It:
creates a table with the total number of different users in the table
creates a table like yours, with the addition of a filed for the total of different users that visited your page and the page of the row
selects the desired fields from these two tables and computes the %
.
WITH
t_total_users as (select count(DISTINCT fullVisitorId) as total_users from `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`),
t_other_pages as (SELECT
hits.page.pagePath AS other_seen_pages,
COUNT(hits.page.pagePath) AS number_other_seen_pages,
COUNT(DISTINCT fullvisitorID) as visitors_per_page
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`, UNNEST(hits) AS hits
WHERE fullVisitorId IN (
SELECT fullVisitorId
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`,UNNEST(hits) AS hits
WHERE hits.page.pagePath LIKE '/google+redesign/electronics'
GROUP BY fullVisitorId )
AND hits.page.pagePath IS NOT NULL
AND hits.page.pagePath NOT LIKE '/google+redesign/electronics'
GROUP BY other_seen_pages
ORDER BY number_other_seen_pages DESC)
SELECT
t_other_pages.other_seen_pages,
t_other_pages.number_other_seen_pages,
t_other_pages.visitors_per_page,
t_total_users.total_users,
(t_other_pages.visitors_per_page/t_total_users.total_users)*100 as percentage_visitants
FROM t_total_users, t_other_pages
If there is something in the query goal I missunderstood please specify!

Does BigQuery include intraday tables when I query over all dates up to current date?

Google Analytics 360 data in BigQuery has two intraday tables for the past two days, and permanent partitioned tables for the dates before that. When I run a query on the ga_sessions_ tables for the past 30 days, does this automatically include the two days' data in the ga_sessions_intraday_ tables or do I have to include them specifically?
Edit; here is a query that illustrates this:
SELECT date, visitId, totals.transactions
FROMdataset.ga_sessions_2018*
WHERE
_TABLE_SUFFIX BETWEEN "0401"
AND CAST(CURRENT_DATE() as STRING)
ORDER BY date DESC
The result is that the most recent date is two days ago (ie not including intraday tables.) That's my question answered I guess, thanks anyway.
You can query across whatever tables you want; just write a filter that matches the right suffixes. For example,
SELECT date, visitId, totals.transactions, _TABLE_SUFFIX AS suffix
FROM `dataset.ga_sessions_*` WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN "20180401" AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
ORDER BY date DESC
I put the suffix in the select list so you can tell which table is matched.

Big Query and Google Analytics UI do not match when ecommerce action filter applied

We are validating a query in Big Query, and cannot get the results to match with the google analytics UI. A similar question can be found here, but in our case the the mismatch only occurs when we apply a specific filter on ecommerce_action.action_type.
Here is the query:
SELECT COUNT(distinct fullVisitorId+cast(visitid as string)) AS sessions
FROM (
SELECT
device.browserVersion,
geoNetwork.networkLocation,
geoNetwork.networkDomain,
geoNetwork.city,
geoNetwork.country,
geoNetwork.continent,
geoNetwork.region,
device.browserSize,
visitNumber,
trafficSource.source,
trafficSource.medium,
fullvisitorId,
visitId,
device.screenResolution,
device.flashVersion,
device.operatingSystem,
device.browser,
totals.pageviews,
channelGrouping,
totals.transactionRevenue,
totals.timeOnSite,
totals.newVisits,
totals.visits,
date,
hits.eCommerceAction.action_type
FROM
(select *
from TABLE_DATE_RANGE([zzzzzzzzz.ga_sessions_],
<range>) ))t
WHERE
hits.eCommerceAction.action_type = '2' and <stuff to remove bots>
)
From the UI using the built in shopping behavior report, we get 3.836M unique sessions with a product detail view, compared with 3.684M unique sessions in Big Query using the query above.
A few questions:
1) We are under the impression the shopping behavior report "Sessions with Product View" breakdown is based off of the ecommerce_action.actiontype filter. Is that true?
2) Is there a .totals pre-aggregated table that the UI maybe pulling from?
It sounds like the issue is that COUNT(DISTINCT ...) is approximate when using legacy SQL, as noted in the migration guide, so the counts are not accurate. Either use standard SQL instead (preferred) or use EXACT_COUNT_DISTINCT with legacy SQL.
You're including product list views in your query.
As described in https://support.google.com/analytics/answer/3437719 you need to make sure, that no product has isImpression = TRUE because that would mean it is a product list view.
This query sums all sessions which contain any action_type='2' for which all isProduct are null or false:
SELECT
SUM(totals.visits) AS sessions
FROM
`project.123456789.ga_sessions_20180101` AS t
WHERE
(
SELECT
LOGICAL_OR(h.ecommerceaction.action_type='2')
FROM
t.hits AS h
WHERE
(SELECT LOGICAL_AND(isimpression IS NULL OR isimpression = FALSE) FROM h.product))
For legacySQL you can adapt the example in the documentation.
In addition to the fact that COUNT(DISTINCT ...) is approximate when using legacy SQL, there could be sessions in which there are only non-interactive hits, which will not be counted as sessions in the Google Analytics UI but they are counted by both COUNT(DISTINCT ...) and EXACT_COUNT_DISTINCT(...) because in your query they count visit id's.
Using SUM(totals.visits) you should get the same result as in the UI because SUM does not take into account NULL values of totals.visits (corresponding to sessions in which there are only non-interactive hits).

Results of joined queries in BQ don't match data in Google Analytics

Background
In BigQuery, I'm trying to find the number of visitors that both visit one of two pages and purchase a specific product.
When I run each of the sub-queries, the numbers match exactly what I see in Google Analytics.
However, when I join them, the number is different than what I see in GA. I've had someone bring the results of the two sub-queries into Excel and do the equivalent, and their results equal what I'm seeing in BQ.
Details
Here's the query:
SELECT
ProductSessions.date AS date,
SUM(ProductTransactions.totalTransactions) transactions,
COUNT(ProductSessions.visitId) visited_product_sessions
FROM (
SELECT
visitId, date
FROM
`103554833.ga_sessions_20170219`
WHERE
EXISTS(
SELECT 1 FROM UNNEST(hits) h
WHERE REGEXP_CONTAINS(h.page.pagePath, r"^www.domain.com/(product|product2).html.*"))
GROUP BY visitID, date)
AS ProductSessions
LEFT JOIN (
SELECT
totals.transactions as totalTransactions,
visitId,
date
FROM
`103554833.ga_sessions_20170219`
WHERE
totals.transactions IS NOT NULL
AND EXISTS(
SELECT 1
FROM
UNNEST(hits) h,
UNNEST(h.product) prod
WHERE REGEXP_CONTAINS(prod.v2ProductName, r"^Product®$"))
GROUP BY
visitId, totals.transactions,
date) AS ProductTransactions
ON
ProductTransactions.visitId = ProductSessions.visitId
WHERE ProductTransactions.visitId is not null
GROUP BY
date
ORDER BY
date ASC
I'm expecting ProductTransactions.totalTransactions to replicate the number of transactions in Google Analytics when filtered with an advanced segment of both:
Sessions include Page matching RegEx: www.domain.com/(product|product2).html.*
Sessions include Product matches exactly: Product®
However, results in BG are about 20% higher than in GA.
Why the difference?

Bigquery union/join error

I am getting an error when trying to pull from my google analytics bigquery export tables... I want to look at a month's worth of data with some filters (including one that narrows it down to a list of specific fullvisitorids of interest). However, when I run the following query, I get this error:
Error: (L2:1): JOIN (including semi-join) and UNION ALL (comma) may not be combined in a single SELECT statement. Either move the UNION ALL to an inner query or the JOIN to an outer query.
select date, fullvisitorid, visitid, visitstarttime, visitnumber, hits.hitNumber, hits.page.pagePath, hits.page.pageTitle, hits.type --and other columns
FROM (TABLE_DATE_RANGE([mydata.ga_sessions_],TIMESTAMP('2015-02-01'),TIMESTAMP('2015-02-28')))
where fullvisitorid in (select * from [mydata.visitorid_lookup]) --table includes a list of fullvisitorids I am interested in
and device.browser!='Internet Explorer'
and lower(hits.page.pagePath) not like '%refer%'
and lower(hits.page.pagePath) like '%sample%'
So I change my query to this:
select * from (
select date, fullvisitorid, visitid, visitstarttime, visitnumber, hits.hitNumber, hits.page.pagePath, hits.page.pageTitle, hits.type
FROM (TABLE_DATE_RANGE([mydata.ga_sessions_],TIMESTAMP('2015-02-01'),TIMESTAMP('2015-02-28')))
where device.browser!='Internet Explorer'
and lower(hits.page.pagePath) not like '%refer%'
and lower(hits.page.pagePath) like '%sample%')
where fullvisitorid in (select * from [mydata.visitorid_lookup_test])
Which then gives me an error saying response is too large to return. It would be cut down significantly if the where statement for fullvisitorid was being executed within the subquery, but of course that doesn't seem possible. So I feel like I'm between a rock and a hard place on this... Is there another way I am missing? Thanks!
The "result is too large" error applies to the final result of the query, which means that the result is too large even after semijoin in WHERE is applied. This should work though if you use "Allow Large Results" setting.

Resources