Getting Error while using WITHIN Clause in BigQuery - google-analytics

I Just try to get some custom Dimensions with a Code Snippet from the BigQuery Cookbook:
​​​SELECT fullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM [tableID.ga_sessions_20150305]
LIMIT 100
When i try to execute it i get the following Error:
Syntax error: Expected end of input but got keyword WITHIN at [6:8]
I have no idea how to solve this.

this query supposed to be run in BigQuery Legacy SQL!
Add #legacySQL as the first row as in below and try again. see also Switching SQL dialects for more details
#legacySQL
SELECT fullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

Related

Error in querying Google Analytics data in BigQuery: 'Correlated aliases referenced in the from clause must refer to arrays that are vali..'

In BigQuery, I have created the following query from a BigQuery partitioned table, with as initial source Google Analytics-data. The goal is to get # sessions, product revenue and shipping costs. Note that in the current setup I can't use the 'aggregated' fields like totals.visits.
SELECT c_country AS country, date As Date, COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(Visitid AS STRING), CAST(visitStartTime AS STRING))) AS Sessions,
(SELECT SUM(product.productRevenue)/1000000 FROM t.hits as hits, hits.product AS product) AS Product_Revenue, (SELECT SUM(hits.transaction.transactionShipping)/1000000 FROM t.hits AS hits) AS Shipping_Costs
FROM `xx.yy.zz` as t
WHERE c_date BETWEEN "2019-11-06" AND "2019-11-06"
GROUP BY c_country, date
Now, the following error message appears:
"Correlated aliases referenced in the from clause must refer to arrays
that are valid to access from the outer query, but t refers to an
array that is not valid to access after GROUP BY or DISTINCT in the
outer query at [2:50]"
Does anyone know how to adjust the query so that the query executes without issues?

Querying "in" eventAction array in big query

I apologize if this has been asked before, but I can't seem to find a specific doc describing how to do this. We are importing our GA data into big query. i simply need to see waht visitors on our site have been viewing two or more pages and completing at least one of a few actions. I am fairly new to BQ, and teh docs I have read talk of using UNNEST, unfortunately, thi sis the issue I am seeing, when i run this query:
SELECT visitId, totals.pageviews FROM `analytics-acquisition-funnel.119485123.ga_sessions_20181009` WHERE totals.pageviews > 2 AND
'modal-click' IN UNNEST(hits.eventInfo.eventAction)
order by totals.pageviews DESC LIMIT 100000
I get the following issue, shouldn't this work. I apologize, I reading this doc, but I feel like my use case is simpler than most shown:
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#scanning-arrays
Cannot access field eventInfo on a value with type ARRAY> at [2:30]
Below is for BigQuery Standard SQL
#standardSQL
SELECT visitId, totals.pageviews
FROM `analytics-acquisition-funnel.119485123.ga_sessions_20181009`
WHERE totals.pageviews > 2
AND (SELECT COUNTIF(eventInfo.eventAction = 'modal-click') FROM UNNEST(hits)) > 0
ORDER BY totals.pageviews DESC
LIMIT 100000
OR
#standardSQL
SELECT visitId, totals.pageviews
FROM `analytics-acquisition-funnel.119485123.ga_sessions_20181009`
WHERE totals.pageviews > 2
AND EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'modal-click')
ORDER BY totals.pageviews DESC
LIMIT 100000

Big Query and Google Analytics UI do not match when ecommerce action filter applied

We are validating a query in Big Query, and cannot get the results to match with the google analytics UI. A similar question can be found here, but in our case the the mismatch only occurs when we apply a specific filter on ecommerce_action.action_type.
Here is the query:
SELECT COUNT(distinct fullVisitorId+cast(visitid as string)) AS sessions
FROM (
SELECT
device.browserVersion,
geoNetwork.networkLocation,
geoNetwork.networkDomain,
geoNetwork.city,
geoNetwork.country,
geoNetwork.continent,
geoNetwork.region,
device.browserSize,
visitNumber,
trafficSource.source,
trafficSource.medium,
fullvisitorId,
visitId,
device.screenResolution,
device.flashVersion,
device.operatingSystem,
device.browser,
totals.pageviews,
channelGrouping,
totals.transactionRevenue,
totals.timeOnSite,
totals.newVisits,
totals.visits,
date,
hits.eCommerceAction.action_type
FROM
(select *
from TABLE_DATE_RANGE([zzzzzzzzz.ga_sessions_],
<range>) ))t
WHERE
hits.eCommerceAction.action_type = '2' and <stuff to remove bots>
)
From the UI using the built in shopping behavior report, we get 3.836M unique sessions with a product detail view, compared with 3.684M unique sessions in Big Query using the query above.
A few questions:
1) We are under the impression the shopping behavior report "Sessions with Product View" breakdown is based off of the ecommerce_action.actiontype filter. Is that true?
2) Is there a .totals pre-aggregated table that the UI maybe pulling from?
It sounds like the issue is that COUNT(DISTINCT ...) is approximate when using legacy SQL, as noted in the migration guide, so the counts are not accurate. Either use standard SQL instead (preferred) or use EXACT_COUNT_DISTINCT with legacy SQL.
You're including product list views in your query.
As described in https://support.google.com/analytics/answer/3437719 you need to make sure, that no product has isImpression = TRUE because that would mean it is a product list view.
This query sums all sessions which contain any action_type='2' for which all isProduct are null or false:
SELECT
SUM(totals.visits) AS sessions
FROM
`project.123456789.ga_sessions_20180101` AS t
WHERE
(
SELECT
LOGICAL_OR(h.ecommerceaction.action_type='2')
FROM
t.hits AS h
WHERE
(SELECT LOGICAL_AND(isimpression IS NULL OR isimpression = FALSE) FROM h.product))
For legacySQL you can adapt the example in the documentation.
In addition to the fact that COUNT(DISTINCT ...) is approximate when using legacy SQL, there could be sessions in which there are only non-interactive hits, which will not be counted as sessions in the Google Analytics UI but they are counted by both COUNT(DISTINCT ...) and EXACT_COUNT_DISTINCT(...) because in your query they count visit id's.
Using SUM(totals.visits) you should get the same result as in the UI because SUM does not take into account NULL values of totals.visits (corresponding to sessions in which there are only non-interactive hits).

SQLite: Error while executing query: near "WITH"

I need to change my Oracle query to SQLite.
It is some kind of calendar.
Oracle query, which works fine:
SELECT TRUNC(sysdate,'DD') - level AS d
FROM dual
CONNECT BY level <= 180
SQLite query, which I have written:
WITH RECURSIVE
dates(day_date) AS (
SELECT date('now','-180 day')
UNION ALL
SELECT day_date+1
FROM dates WHERE day_date < date('now')
)
select * from dates;
It throws an error, when I am executing it.
Error while executing query: near "WITH": syntax error
What is wrong with my code? I used this page to check syntax: https://www.sqlite.org/lang_with.html
Common table expressions are not available before SQLite version 3.8.3.

Bigquery union/join error

I am getting an error when trying to pull from my google analytics bigquery export tables... I want to look at a month's worth of data with some filters (including one that narrows it down to a list of specific fullvisitorids of interest). However, when I run the following query, I get this error:
Error: (L2:1): JOIN (including semi-join) and UNION ALL (comma) may not be combined in a single SELECT statement. Either move the UNION ALL to an inner query or the JOIN to an outer query.
select date, fullvisitorid, visitid, visitstarttime, visitnumber, hits.hitNumber, hits.page.pagePath, hits.page.pageTitle, hits.type --and other columns
FROM (TABLE_DATE_RANGE([mydata.ga_sessions_],TIMESTAMP('2015-02-01'),TIMESTAMP('2015-02-28')))
where fullvisitorid in (select * from [mydata.visitorid_lookup]) --table includes a list of fullvisitorids I am interested in
and device.browser!='Internet Explorer'
and lower(hits.page.pagePath) not like '%refer%'
and lower(hits.page.pagePath) like '%sample%'
So I change my query to this:
select * from (
select date, fullvisitorid, visitid, visitstarttime, visitnumber, hits.hitNumber, hits.page.pagePath, hits.page.pageTitle, hits.type
FROM (TABLE_DATE_RANGE([mydata.ga_sessions_],TIMESTAMP('2015-02-01'),TIMESTAMP('2015-02-28')))
where device.browser!='Internet Explorer'
and lower(hits.page.pagePath) not like '%refer%'
and lower(hits.page.pagePath) like '%sample%')
where fullvisitorid in (select * from [mydata.visitorid_lookup_test])
Which then gives me an error saying response is too large to return. It would be cut down significantly if the where statement for fullvisitorid was being executed within the subquery, but of course that doesn't seem possible. So I feel like I'm between a rock and a hard place on this... Is there another way I am missing? Thanks!
The "result is too large" error applies to the final result of the query, which means that the result is too large even after semijoin in WHERE is applied. This should work though if you use "Allow Large Results" setting.

Resources