Bigquery union/join error - google-analytics

I am getting an error when trying to pull from my google analytics bigquery export tables... I want to look at a month's worth of data with some filters (including one that narrows it down to a list of specific fullvisitorids of interest). However, when I run the following query, I get this error:
Error: (L2:1): JOIN (including semi-join) and UNION ALL (comma) may not be combined in a single SELECT statement. Either move the UNION ALL to an inner query or the JOIN to an outer query.
select date, fullvisitorid, visitid, visitstarttime, visitnumber, hits.hitNumber, hits.page.pagePath, hits.page.pageTitle, hits.type --and other columns
FROM (TABLE_DATE_RANGE([mydata.ga_sessions_],TIMESTAMP('2015-02-01'),TIMESTAMP('2015-02-28')))
where fullvisitorid in (select * from [mydata.visitorid_lookup]) --table includes a list of fullvisitorids I am interested in
and device.browser!='Internet Explorer'
and lower(hits.page.pagePath) not like '%refer%'
and lower(hits.page.pagePath) like '%sample%'
So I change my query to this:
select * from (
select date, fullvisitorid, visitid, visitstarttime, visitnumber, hits.hitNumber, hits.page.pagePath, hits.page.pageTitle, hits.type
FROM (TABLE_DATE_RANGE([mydata.ga_sessions_],TIMESTAMP('2015-02-01'),TIMESTAMP('2015-02-28')))
where device.browser!='Internet Explorer'
and lower(hits.page.pagePath) not like '%refer%'
and lower(hits.page.pagePath) like '%sample%')
where fullvisitorid in (select * from [mydata.visitorid_lookup_test])
Which then gives me an error saying response is too large to return. It would be cut down significantly if the where statement for fullvisitorid was being executed within the subquery, but of course that doesn't seem possible. So I feel like I'm between a rock and a hard place on this... Is there another way I am missing? Thanks!

The "result is too large" error applies to the final result of the query, which means that the result is too large even after semijoin in WHERE is applied. This should work though if you use "Allow Large Results" setting.

Related

Getting Error while using WITHIN Clause in BigQuery

I Just try to get some custom Dimensions with a Code Snippet from the BigQuery Cookbook:
​​​SELECT fullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM [tableID.ga_sessions_20150305]
LIMIT 100
When i try to execute it i get the following Error:
Syntax error: Expected end of input but got keyword WITHIN at [6:8]
I have no idea how to solve this.
this query supposed to be run in BigQuery Legacy SQL!
Add #legacySQL as the first row as in below and try again. see also Switching SQL dialects for more details
#legacySQL
SELECT fullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

Error in querying Google Analytics data in BigQuery: 'Correlated aliases referenced in the from clause must refer to arrays that are vali..'

In BigQuery, I have created the following query from a BigQuery partitioned table, with as initial source Google Analytics-data. The goal is to get # sessions, product revenue and shipping costs. Note that in the current setup I can't use the 'aggregated' fields like totals.visits.
SELECT c_country AS country, date As Date, COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(Visitid AS STRING), CAST(visitStartTime AS STRING))) AS Sessions,
(SELECT SUM(product.productRevenue)/1000000 FROM t.hits as hits, hits.product AS product) AS Product_Revenue, (SELECT SUM(hits.transaction.transactionShipping)/1000000 FROM t.hits AS hits) AS Shipping_Costs
FROM `xx.yy.zz` as t
WHERE c_date BETWEEN "2019-11-06" AND "2019-11-06"
GROUP BY c_country, date
Now, the following error message appears:
"Correlated aliases referenced in the from clause must refer to arrays
that are valid to access from the outer query, but t refers to an
array that is not valid to access after GROUP BY or DISTINCT in the
outer query at [2:50]"
Does anyone know how to adjust the query so that the query executes without issues?

Querying "in" eventAction array in big query

I apologize if this has been asked before, but I can't seem to find a specific doc describing how to do this. We are importing our GA data into big query. i simply need to see waht visitors on our site have been viewing two or more pages and completing at least one of a few actions. I am fairly new to BQ, and teh docs I have read talk of using UNNEST, unfortunately, thi sis the issue I am seeing, when i run this query:
SELECT visitId, totals.pageviews FROM `analytics-acquisition-funnel.119485123.ga_sessions_20181009` WHERE totals.pageviews > 2 AND
'modal-click' IN UNNEST(hits.eventInfo.eventAction)
order by totals.pageviews DESC LIMIT 100000
I get the following issue, shouldn't this work. I apologize, I reading this doc, but I feel like my use case is simpler than most shown:
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#scanning-arrays
Cannot access field eventInfo on a value with type ARRAY> at [2:30]
Below is for BigQuery Standard SQL
#standardSQL
SELECT visitId, totals.pageviews
FROM `analytics-acquisition-funnel.119485123.ga_sessions_20181009`
WHERE totals.pageviews > 2
AND (SELECT COUNTIF(eventInfo.eventAction = 'modal-click') FROM UNNEST(hits)) > 0
ORDER BY totals.pageviews DESC
LIMIT 100000
OR
#standardSQL
SELECT visitId, totals.pageviews
FROM `analytics-acquisition-funnel.119485123.ga_sessions_20181009`
WHERE totals.pageviews > 2
AND EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'modal-click')
ORDER BY totals.pageviews DESC
LIMIT 100000

BigQuery GA360 export - product query

I'm working through the documentation with BigQuery, and I'm stuck on a particular query.
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM [XXXXXXX.ga_sessions_20171101]
WHERE fullVisitorId IN (
SELECT fullVisitorId
FROM [XXXXXXXX.ga_sessions_20171101]
WHERE hits.item.productName CONTAINS 'Product A'
AND totals.transactions>=1
GROUP BY fullVisitorId )
AND hits.item.productName IS NOT NULL
AND hits.item.productName !='Product A'
GROUP BY other_purchased_products
ORDER BY quantity DESC;
This one works fine for an individual day in the dataset, however If I want to use TABLE DATE RANGE it errors.
To use the above query across multiple dates, how do I adjust this?
Thanks
With legacy SQL the only way is to use an INNER JOIN on fullvisitorid

Results of joined queries in BQ don't match data in Google Analytics

Background
In BigQuery, I'm trying to find the number of visitors that both visit one of two pages and purchase a specific product.
When I run each of the sub-queries, the numbers match exactly what I see in Google Analytics.
However, when I join them, the number is different than what I see in GA. I've had someone bring the results of the two sub-queries into Excel and do the equivalent, and their results equal what I'm seeing in BQ.
Details
Here's the query:
SELECT
ProductSessions.date AS date,
SUM(ProductTransactions.totalTransactions) transactions,
COUNT(ProductSessions.visitId) visited_product_sessions
FROM (
SELECT
visitId, date
FROM
`103554833.ga_sessions_20170219`
WHERE
EXISTS(
SELECT 1 FROM UNNEST(hits) h
WHERE REGEXP_CONTAINS(h.page.pagePath, r"^www.domain.com/(product|product2).html.*"))
GROUP BY visitID, date)
AS ProductSessions
LEFT JOIN (
SELECT
totals.transactions as totalTransactions,
visitId,
date
FROM
`103554833.ga_sessions_20170219`
WHERE
totals.transactions IS NOT NULL
AND EXISTS(
SELECT 1
FROM
UNNEST(hits) h,
UNNEST(h.product) prod
WHERE REGEXP_CONTAINS(prod.v2ProductName, r"^Product®$"))
GROUP BY
visitId, totals.transactions,
date) AS ProductTransactions
ON
ProductTransactions.visitId = ProductSessions.visitId
WHERE ProductTransactions.visitId is not null
GROUP BY
date
ORDER BY
date ASC
I'm expecting ProductTransactions.totalTransactions to replicate the number of transactions in Google Analytics when filtered with an advanced segment of both:
Sessions include Page matching RegEx: www.domain.com/(product|product2).html.*
Sessions include Product matches exactly: Product®
However, results in BG are about 20% higher than in GA.
Why the difference?

Resources