BigQuery: two hitlevel custom dimensions - google-analytics

I can't seem to get a query that gives me all sessions in which customdimensionX has value X and customdimensionY has value Y within the same hit. The query I currently have results in no results found.
Can anybody help me on this:)?
Thanks!
SELECT sum(totals.visits)
from TABLE_DATE_RANGE([xxxx.ga_sessions_], TIMESTAMP('2016-3-1'),TIMESTAMP('2016-3-1'))
WHERE
(hits.customDimensions.index=x AND hits.customDimensions.value='x')
AND (hits.customDimensions.index=y AND hits.customDimensions.value='y')

Bit strange to answer my own question but it might be useful for someone else:) I got to the right number in the following way:
SELECT EXACT_COUNT_DISTINCT(uniqueVisitId) as sessions
FROM(
SELECT
CONCAT(fullvisitorid,"_",string(visitId)) AS uniqueVisitId,
MAX(IF(hits.customDimensions.index=x,hits.customDimensions.value,NULL)) WITHIN hits AS x,
MAX(IF(hits.customDimensions.index=y,hits.customDimensions.value,NULL)) WITHIN hits AS y,
hits.hitNumber
FROM TABLE_DATE_RANGE([xxxxxx.ga_sessions_], TIMESTAMP('2016-3-1'),TIMESTAMP('2016-3-1'))
having
(x contains 'x' and y contains 'y')
)

Try below options (don't have chance to test, but should be close to what you need, if not exactly):
SELECT SUM(totals.visits)
FROM TABLE_DATE_RANGE([66080915.ga_sessions_], TIMESTAMP('2016-3-1'),TIMESTAMP('2016-3-1'))
OMIT RECORD IF
SUM((hits.customDimensions.index=x AND hits.customDimensions.value='x')
OR (hits.customDimensions.index=y AND hits.customDimensions.value='y')
) != 2
SELECT SUM(totals.visits) FROM (
SELECT totals.visits,
SUM((hits.customDimensions.index=x AND hits.customDimensions.value='x')
OR (hits.customDimensions.index=y AND hits.customDimensions.value='y')
) WITHIN RECORD AS check,
FROM TABLE_DATE_RANGE([66080915.ga_sessions_], TIMESTAMP('2016-3-1'),TIMESTAMP('2016-3-1'))
HAVING check = 2
)
ADDED
If customDimensions where groupped by specific hits like hits.hit.customVariables - you would be able to identify both conditions within the same hit by using
WITHIN hits.hit or OMIT hits.hit IF
vs. respectively
WITHIN RECORD or OMIT RECORD IF
But I've checked BigQuery Export schema and it seems not a case.
I dont see way to distinguish dimensions per specific hit.
Custom Dimensions are presented by level - user/session level, product level and hits level.
Only product level custom dimentions can be identifyed/queryed per product.
Hope this helps

Related

Teradata SQL selecting successive batch of rows

I have 300000 entries in my db and am trying to access entry 50000-100000 (to 50000 total).
My query is as follows:
query = 'SELECT TOP 50000* FROM database ORDER BY col_name QUALIFY ROW_NUMBER() BETWEEN 50000 and 100000'
I only found the BETWEEN KEYWORD in one source however and am suspecting I am not using it correctly since it says it can't be used on a non-ordered database. I assume the QUALIFY then gets evaluated before the ORDER BY.
So I tried something along the lines of
query_second_try = 'SELECT TOP 50000* FROM database QUALIFY ROW_NUMBER() OVER (ORDER BY col_name)'
to see if this fixes the problem (without taking into account the specific rows I want to select). This is also not the case.
I have tried using qualify with rank, but this doesn't seem to be exactly what I need either, I think the BETWEEN statement would be a better fit.
Can someone push me in the right direction here?
I am essentially trying to do the equivalent of 'ORDER BY col_name OFFSET BY 50000' in teradata.
Any help would be appreciated.
Few problems here.
row_number requires an order by. And it needs to be granular enough to ensure it's deterministic. You can also play around with rank, dense_rank, and row_number, depending on what you want to do with ties.
You're also mixing top N and qualify.
Try this:
select
*
from
<table>
qualify row_number() over (order by <column(s)>) between X and Y

BigQuery: unexpected result after filtering table

Given the following query (very simplified):
SELECT hits.page.pagepath AS Page
FROM
`[projectid].[datasetid].ga_sessions_*` t, t.hits as hits
WHERE
_TABLE_SUFFIX BETWEEN '20190123' AND '20190123'
AND (SELECT COUNT(*)>0 FROM t.hits WHERE REGEXP_CONTAINS(hits.page.pagepath,r'dames'))
I expected that this query only returns pages which contain 'dames', but this is actually not the case. With this filter in the WHERE section..
(SELECT COUNT(*)>0 FROM t.hits WHERE REGEXP_CONTAINS(hits.page.pagepath,r'dames'))
..there is flattened on hit-level and filtered on only pages of dames. In the main query there is also flattened on hit-level. So I would expect that per hit there would be TRUE's and FALSE's where only TRUE's remain in the final dataset, namely only pages that contain 'dames'.
I know queries that do return the expected output, but my main question (purely to understand why this query is not working) is actually more: why does this query not work as expected?
Thanks in advance!
You must understand, that cross-joining an unnested array with its parent row does not exactly flatten the source table. It repeats the parent row for every row in the array: in this case every session information gets repeated for every hit: the hits-array itself too!
That means for every hit you could lookup stuff in the whole session, because for every hit there are all hits available, because they too got repeated.
You are accessing this repeated hits array in your WHERE clause.
Instead of writing a sub-select on this repeated array, you want to use the newly available cross-joined fields from that array, i.e. AND REGEXP_CONTAINS(hits.page.pagepath,r'dames')
It might be a bit confusing in your case, because your alias for the flattened hits is hits as well - you might want to consider renaming it to something different like h so your NOT working query looks like this
SELECT h.page.pagepath AS Page
FROM
`[projectid].[datasetid].ga_sessions_*` t, t.hits as h
WHERE
_TABLE_SUFFIX BETWEEN '20190123' AND '20190123'
AND (SELECT COUNT(*)>0 FROM t.hits h2 WHERE REGEXP_CONTAINS(h.page.pagepath,r'dames'))
You are checking for every page whether the whole session contained a page fulfilling your condition.
The WORKING example would be
SELECT h.page.pagepath AS Page
FROM
`[projectid].[datasetid].ga_sessions_*` t, t.hits as h
WHERE
_TABLE_SUFFIX BETWEEN '20190123' AND '20190123'
AND REGEXP_CONTAINS(h.page.pagepath,r'dames')

How to stop a row from show on Null count in query Access 2010

Here is the problem. I have a table and the table is 100% completed. There are no null values in the table. The table is broken down to:
Division > Region > RAPM > Status > Disposition
I want to count how many times "Training" is in [Disposition] for a [Region] using a query.
The error I get is when a [Region] has 0 "Training" in [Disposition] the count is a coming back as Null so the entire row is not shown.
How do i get the count to come back as "0" so I can keep the [Division], [Region], & [RAPM] in the results for reporting even if there is 0 count for training.
I have tried NZ() but this will not work because there is technically no Null cell to be converted.
Here is the statement:
SELECT tblAlignment.Division, tblAlignment.Region,tblAlignment.RAPM, Count(tblCase.Dispostion) AS CountofTraining
FROM tblCase INNER JOIN tblAlignment ON (tblCase.Region = tblAlignment.Region) AND (tblCase.Store = tblAlignment.[Store Number])
Where (((tblCase.Status)="Closed") AND ((tblCase.Disposition)="Training")
Group BY tblAlignment.Division, tblAlignment.Region, tblAlignment.RAPM
HAVING (((tblAlignment.Division)=[Forms]![frmDashboardNative]![NavigationSubform].[Form]![NavigationSubform].[Form]![Combo16]))
This is not the complete answer, but my reputation is not high enough to comment so this is the only way I can respond. To get a fuller answer you would need to provide more detailed table structure including primary keys and relationships between the tables. Making guesses from what you have provided I can make a couple of suggestions, but your post raises a few questions:
You say you want to count related entries based on same Region, but your join links on Region and Site. Is there a relationship between Site and Region? Can a single site only ever appear in one Region? If so then I think this information should possibly be in a separate table.
I think the HAVING condition should actually be in the WHERE clause.
I think you may have duplicated the ![NavigationSubform].[Form]
Anyway, a slightly more generic example of one way of achieving what you're after would be:
SELECT a.Region, Nz(b.RecordCount, 0) AS FinalCount
FROM TableA a
LEFT JOIN (SELECT Region, Count(*) AS RecordCount
FROM TableB
WHERE Status = "Closed" AND Disposition = "Training"
GROUP BY Region) AS b ON a.Region = b.Region
WHERE a.Division = [Combo16]
Hope this is of some help.

BigQuery error: Cannot query the cross product of repeated fields

I am running the following query on Google BigQuery web interface, for data provided by Google Analytics:
SELECT *
FROM [dataset.table]
WHERE
  hits.page.pagePath CONTAINS "my-fun-path"
I would like to save the results into a new table, however I am obtaining the following error message when using Flatten Results = False:
Error: Cannot query the cross product of repeated fields
customDimensions.value and hits.page.pagePath.
This answer implies that this should be possible: Is there a way to select nested records into a table?
Is there a workaround for the issue found?
Depending on what kind of filtering is acceptable to you, you may be able to work around this by switching to OMIT IF from WHERE. It will give different results, but, again, perhaps such different results are acceptable.
The following will remove entire hit record if (some) page inside of it meets criteria. Note two things here:
it uses OMIT hits IF, instead of more commonly used OMIT RECORD IF).
The condition is inverted, because OMIT IF is opposite of WHERE
The query is:
SELECT *
FROM [dataset.table]
OMIT hits IF EVERY(NOT hits.page.pagePath CONTAINS "my-fun-path")
Update: see the related thread, I am afraid this is no longer possible.
It would be possible to use NEST function and grouping by a field, but that's a long shot.
Using flatten call on the query:
SELECT *
FROM flatten([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910],customDimensions)
WHERE
  hits.page.pagePath CONTAINS "m"
Thus in the web ui:
setting a destination table
allowing large results
and NO flatten results
does the job correctly and the produced table matches the original schema.
I know - it is old ask.
But now it can be achieved by just using standard SQL dialect instead of Legacy
#standardSQL
SELECT t.*
FROM `dataset.table` t, UNNEST(hits.page) as page
WHERE
  page.pagePath CONTAINS "my-fun-path"

Sessions by hits.page.pagePath in GA bigquery tables

I am new to bigquery, so sorry if this is a noob question! I am interested in breaking out sessions by page path or title. I understand one session can contain multiple paths/titles so the sum would be greater than total sessions. Essentially, I want to create a 'session id' and do a count distinct of sessionids where path like a or b.
It might actually be helpful to start at the very beginning and manually calculate total sessions. I tried to concatenate visit id and full visitor id to create a unique visit id, but apparently that is quite different from sessions. Can someone help enlighten me? Thanks!
I am working with our GA site data. Schema is the standard in GA exports.
DATA SAMPLE
Let's use an example out of the sample BigQuery (London Helmet) data:
There are 63 sessions in this day:
SELECT count(*) FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
How many of those sessions are where hits.page.pagePath like /vests% or /helmets%? How many were vests only vs helmets only? Thanks!
Here is an example of how to calculate whether there were only helmets, or only vests or both helmets and vests or neither:
SELECT
visitID,
has_helmets AND has_vests AS both_helmets_and_vests,
has_helmets AND NOT has_vests AS helmets_only,
NOT has_helmets AND has_vests AS vests_only,
NOT has_helmets AND NOT has_vests AS neither_helmets_nor_vests
FROM (
SELECT
visitId,
SOME(hits.page.pagePath like '/helmets%') WITHIN RECORD AS has_helmets,
SOME(hits.page.pagePath like '/vests%') WITHIN RECORD AS has_vests,
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
)
Way 1, easier but you need to repeat on each field
Obviously you can do something like this :
SELECT count(*) FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] WHERE hits.page.pagePath like '/helmets%'
And then have multiple queries for your own substrings (one with '/vests%', one with 'helmets%', etc).
Way 2, works fine, but not with repeated fields
If you want ONE query that'll just group by on the first part of the string, you can do something like that :
Select a, Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] ) group by a
When I do this, it returns me the following the 63 sessions, with a total count of 63 :).
Way 3, using a FLATTEN on the table to get each hit individually
Since the "hits" field is repeatable, you would need a FLATTEN in your query :
Select a, Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a FROM FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] , hits)) group by a
The reason why you need to FLATTEN here is that the "hits" field is repeatable. If you don't flatten, it won't look into ALL the "hits" in your response. Adding "FLATTEN" will make you work off a sub-table where each hit is in its own row, so you can query on all of them.
If you want it by sessions instead of hits, (it'll be both), do something like :
Select b, a Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a, visitID as b, FROM FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] , hits)) group by b, a

Resources