Query custom dimensions on the sessions AND the hit level - google-analytics

The customDimensions with index 6 corresponds to a UUID both on the session and the hit levels.
On the session level I can use the following standard SQL query to retrieve the UUID:
CREATE TEMP FUNCTION customDimensionByIndex(indx INT64, arr ARRAY<STRUCT<index INT64, value STRING>>) AS (
(SELECT x.value FROM UNNEST(arr) x WHERE indx=x.index)
);
SELECT
customDimensionByIndex(6, customDimensions) AS session_uuid -- Customer UUID
FROM `94860076.ga_sessions_20170822`
limit 10
Similarly, on the hits level I can use:
CREATE TEMP FUNCTION customDimensionByIndex(indx INT64, arr ARRAY<STRUCT<index INT64, value STRING>>) AS (
(SELECT x.value FROM UNNEST(arr) x WHERE indx=x.index)
);
SELECT
customDimensionByIndex(6, hits.customDimensions) AS hit_uuid -- Customer UUID
FROM `94860076.ga_sessions_20170822`, unnest(hits) as hits
limit 10
However, I fail to use both in the same query. For example, I want to have a results set where each row corresponds to a session and the columns are session_uuid and array_of_hit_uuids. How can this be achieved?

Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION customDimensionByIndex(indx INT64, arr ARRAY<STRUCT<index INT64, value STRING>>) AS (
(SELECT x.value FROM UNNEST(arr) x WHERE indx=x.index)
);
SELECT *
FROM (
SELECT
customDimensionByIndex(6, customDimensions) AS session_uuid,
ARRAY(
SELECT val FROM (
SELECT customDimensionByIndex(6, hits.customDimensions) AS val
FROM UNNEST(hits) AS hits
)
WHERE NOT val IS NULL
) AS hit_uuid
FROM `94860076.ga_sessions_20170822`
)
WHERE session_uuid IS NOT NULL
LIMIT 10
You can test it with public dataset
#standardSQL
CREATE TEMP FUNCTION customDimensionByIndex(indx INT64, arr ARRAY<STRUCT<index INT64, value STRING>>) AS (
(SELECT x.value FROM UNNEST(arr) x WHERE indx=x.index)
);
SELECT *
FROM (
SELECT
customDimensionByIndex(2, customDimensions) AS session_uuid,
ARRAY(
SELECT val FROM (
SELECT customDimensionByIndex(1, hits.customDimensions) AS val
FROM UNNEST(hits) AS hits
)
WHERE NOT val IS NULL
) AS hit_uuid
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
)
WHERE session_uuid IS NOT NULL
LIMIT 10

Related

Unnest hits and Unnesting session scoped custom dimension BigQuery code filter

I am trying to filter a funnel based on users who have certain custom dimension values. Sadly, the custom dimension in question is session-scoped and not hit-based, so i cannot use hits.customDimensions in this particular query. What is the best way to do this and achieve the desired result?
Find my progress so far:
#standardSQL
SELECT
SUM((SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page' LIMIT 1)) One_Page,
SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/two - Page' LIMIT 1)) Two_Page,
SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/three - Page' LIMIT 1)) Three_Page,
SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/four - Page' LIMIT 1)) Four_Page
FROM `xxxxxxx.ga_sessions_*`,
UNNEST(hits) AS h,
UNNEST(customDimensions) AS cusDim
WHERE
_TABLE_SUFFIX BETWEEN '20190320' AND '20190323'
AND h.hitNumber = 1
AND cusDim.index = 6
AND cusDim.value IN ('60','70)
Segmentation with Custom Dimensions
You can filter for sessions based on conditions in custom dimensions. Simply write a sub-query counting cases of interest and set to ">0". Example for sample data:
SELECT
fullvisitorid,
visitstarttime,
customdimensions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
-- there should be at least one case with index=4 and value='EMEA' ... you can use your index and desired value
-- unnest() turns customdimensions into table format, so we can apply SQL to this array
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
limit 100
You comment the WHERE statement to see all the data.
Funnel
First you might want to get an overview of what is going on in your hits array:
SELECT
fullvisitorid,
visitstarttime,
-- get an overview over relevant hits data
-- select as struct feeds hits fields into a new array created by array()-function
ARRAY(select as struct hitnumber, page from unnest(hits) where type='PAGE') hits
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
and totals.pageviews>3
limit 100
Now that you made sure the data makes sense you can create a funnel array containing the hit numbers of the relevant steps:
SELECT
fullvisitorid,
visitstarttime,
-- create array with relevant info
-- cross join hit numbers from step pages to get all combinations so that we can check later which came after the other
ARRAY(
select as struct * from
(select hitnumber as step1 from unnest(hits) where type='PAGE' and page.pagePath='/home') left join
(select hitnumber as step2 from unnest(hits) where type='PAGE' and page.pagePath like '/google+redesign/%') on true left join
(select hitnumber as step3 from unnest(hits) where type='PAGE' and page.pagePath='/basket.html') on true
) AS funnel
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
and totals.pageviews>3
limit 100
Put this into a WITH statement for more clarity and run your analysis by summarizing the corresponding cases:
WITH f AS (
SELECT
fullvisitorid,
visitstarttime,
totals.visits,
-- create array with relevant info
-- cross join hit numbers from step pages to get all combinations so that we can check later which came after the other
ARRAY(
select as struct * from
(select hitnumber as step1 from unnest(hits) where type='PAGE' and page.pagePath='/home') left join
(select hitnumber as step2 from unnest(hits) where type='PAGE' and page.pagePath like '/google+redesign/%') on true left join
(select hitnumber as step3 from unnest(hits) where type='PAGE' and page.pagePath='/basket.html') on true
) AS funnel
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
and totals.pageviews>3
)
SELECT
COUNT(DISTINCT fullvisitorid) as users,
SUM(visits) as allSessions,
SUM( IF(array_length(funnel)>0,visits,0) ) sessionsWithFunnelPages,
SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null ) ,visits,0) ) sessionsWithStep1,
SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null and step1<step2 ) ,visits,0) ) sessionsFunnelToStep2,
SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null and step1<step2 and step2<step3 and step1<step3) ,visits,0) ) sessionsFunnelToStep3
FROM f
Please test before using.

To INSERT values from CTE result

I have 'SchoolYearStartEnd' table
CREATE TABLE SchoolYearStartEnd (
id INT PRIMARY KEY UNIQUE,
StartDate DATE,
EndDate DATE
);
and the second 'SchoolYearsTeachingDays' table
CREATE TABLE SchoolYearsTeachingDays (
aDate DATE PRIMARY KEY UNIQUE
);
which I want to fill out with dates from a CTE like this:
WITH RECURSIVE dates(x) AS (
SELECT (SELECT StartDate FROM SchoolYearStartEnd)
UNION ALL
SELECT DATE(x, '+1 DAYS') FROM dates WHERE x < (SELECT EndDate FROM SchoolYearStartEnd)
)
SELECT * FROM dates WHERE CAST(STRFTIME('%w',x) AS INTEGER) > 0
;
I tried with this code here:
INSERT INTO SchoolYearsTeachingDays (aDate) VALUES (
WITH RECURSIVE dates(x) AS (
SELECT (SELECT StartDate FROM SchoolYearStartEnd)
UNION ALL
SELECT DATE(x, '+1 DAYS') FROM dates WHERE x < (SELECT EndDate FROM SchoolYearStartEnd)
)
SELECT * FROM dates WHERE CAST(STRFTIME('%w',x) AS INTEGER) > 0 -- To exclude Sundays.
;
);
but without success. I get these errors:
Error: near "RECURSIVE": syntax error
Error: near ")": syntax error
So what am I missing here?
Best, Pal
When you are inserting from a SELECT query, you must not use VALUES:
INSERT INTO SchoolYearsTeachingDays (aDate)
WITH RECURSIVE dates(x) AS (...)
SELECT * FROM dates ...;

How to select multiple custom Firebase event parameters in BigQuery?

I exported Firebase events to BigQuery and now I'm trying to select two parameters from a certain event. Here is the query for selecting one parameter:
select event_dim.params.value.int_value as level_id
from [com_company_appname_ANDROID.app_events_20161210]
where event_dim.name = "level_replays_until_first_victory" and event_dim.params.key = "level_id"
Both parameters are int values, name of the first parameter is level_id, and the second parameter is count. What I would like is to show is level_id in first column and count in second column.
Below will work with BigQuery Standard SQL
SELECT
(SELECT params.value.int_value FROM x.params
WHERE params.key = 'level_id') AS level_id,
(SELECT params.value.int_value FROM x.params
WHERE params.key = 'count') AS count
FROM `com_company_appname_ANDROID.app_events_20161210`, UNNEST(event_dim) AS x
WHERE x.name = 'level_replays_until_first_victory'
See also Migrating from legacy SQL in case if you are stuck with Legacy SQL
I love the previous solution! Here is an alternative solution for the same problem I came up with. I'd welcome comments on which solution is more efficient/cheaper and why.
SELECT event_param1.value.int_value AS level_id,
event_param2.value.int_value AS count
FROM `com_company_appname_ANDROID.app_events_20161210`,
UNNEST(event_dim) event,
UNNEST(event.params) as event_param1,
UNNEST(event.params) as event_param2
WHERE event.name = 'level_replays_until_first_victory'
AND event_param1.key = 'level_id'
AND event_param2.key = 'count'
Another solution I find quite handy is to use User Defined Functions to analyze user properties and event parameters
#Standard-SQL
#UDF for event parameters
CREATE TEMP FUNCTION paramValueByKey(k STRING, params ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, double_value FLOAT64 >>>) AS (
(SELECT x.value FROM UNNEST(params) x WHERE x.key=k)
);
#UDF for user properties
CREATE TEMP FUNCTION propertyValueByKey(k STRING, properties ARRAY<STRUCT<key STRING, value STRUCT<value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, double_value FLOAT64>, set_timestamp_usec INT64, index INT64 > >>) AS (
(SELECT x.value.value FROM UNNEST(properties) x WHERE x.key=k)
);
#Query the sample dataset, unnesting the events and turn 'api_version', 'round' and 'type_of_game' into columns
SELECT
user_dim.user_id,
event.name,
propertyValueByKey('api_version', user_dim.user_properties).string_value AS api_version,
paramValueByKey('round', event.params).int_value as round,
paramValueByKey('type_of_game', event.params).string_value as type_of_game
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`,
UNNEST(event_dim) as event
WHERE event.name = 'round_completed'
LIMIT 10;
An update to the second solution
SELECT
event_param1.value.int_value AS level_id,
event_param2.value.int_value AS count
FROM
`com_company_appname_ANDROID.app_events_20161210`,
UNNEST(event_params) as event_param1,
UNNEST(event_params) as event_param2
WHERE event_name = 'level_replays_until_first_victory'
AND
event_param1.key = 'level_id'
AND
event_param2.key = 'count'

SQLite FTS4 with preferred language

I have an SQLite table that was generated by using the FTS4 module. Each entry is listed at least twice with different languages, but still sharing a unique ID (int column, not indexed).
Here is what I want to do:
I want to lookup a term in a preferred language. I want to union the result with a lookup for the same term using another language.
For the second lookup though, I want to ignore all entries (identified by their ID) that I already found during the first lookup. So basically I want to do this:
WITH term_search1 AS (
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 1)
SELECT *
FROM term_search1
UNION
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 2
AND id NOT IN (SELECT id FROM term_search1)
The problem here is, that the term_seach1 Query would be executed twice. Is there a way of materializing my results maybe? Any solution for limiting it to 2 Queries (instead of 3) would be great.
I also tried using recursive Queries, something like:
WITH RECURSIVE term_search1 AS (
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 1
UNION ALL
SELECT m.*
FROM myFts m LEFT OUTER JOIN term_search1 t ON (m.id = t.id)
WHERE myFts MATCH 'term'
AND m.languageId = 2
AND t.id IS NULL
)
SELECT * FROM term_search1
This didn't work neither. Apparently he just executed two lookups for languageId = 2 (is this a bug maybe?).
Thanks in advance :)
You can use TEMPORARY tables to reduce the number of queries to myFts to 2:
CREATE TEMP TABLE results (id INTEGER PRIMARY KEY);
INSERT INTO results
SELECT id FROM myFts
WHERE myFts MATCH 'term' AND languageId = 1;
INSERT INTO results
SELECT id FROM myFts
WHERE myFts MATCH 'term' AND languageId = 2
AND id NOT IN (SELECT id FROM results);
SELECT * FROM myFts
WHERE id IN (SELECT id FROM results);
DROP TABLE results;
If it's possible to change the schema, you should only keep text data in the FTS table. This way you will avoid incorrect results when you are searching for numbers and rows matching languageId is not desired. Create another meta table holding non-textual data (like id and languageId) and filter the rows by joining against the rowid of the myFts. This way you will need to query the FTS table only once - use the temporary table to store the FTS table results then use the meta table to order them.
This is the best I can think of :
SELECT *
FROM myFts t1
JOIN (SELECT COUNT(*) AS cnt, id
FROM myFts t2
WHERE t2.languageId in (1, 2)
AND t2.myFts MATCH 'term'
GROUP BY t2.id) t3
ON t1.id = t3.id
WHERE t1.myFts MATCH 'term'
AND t1.languageId in (1, 2)
AND (t1.languageId = 1 or t3.cnt = 1)
I am not sure if the second MATCH clause is necessary.
The idea is to first count the acceptable rows, then choose the best one.
Edit : I have no idea why it does not work with your table. This is what I did to test it (SQLite version 3.8.10.2):
CREATE VIRTUAL TABLE myFts USING fts4(
id integer,
languageId integer,
content TEXT
);
insert into myFts(id, languageId, content) values (10, 1, 'term 10 lang 1');
insert into myFts(id, languageId, content) values (10, 2, 'term 10 lang 2');
insert into myFts(id, languageId, content) values (11, 1, 'term 11 lang 1');
insert into myFts(id, languageId, content) values (12, 2, 'term 12 lang 2');
insert into myFts(id, languageId, content) values (13, 1, 'not_erm 13 lang 1');
insert into myFts(id, languageId, content) values (13, 2, 'term 13 lang 2');
executing the query gives :
sqlite> SELECT *
...> FROM myFts t1
...> JOIN (SELECT COUNT(*) AS cnt, id
...> FROM myFts t2
...> WHERE t2.languageId in (1, 2)
...> AND t2.myFts MATCH 'term'
...> GROUP BY t2.id) t3
...> ON t1.id = t3.id
...> WHERE t1.myFts MATCH 'term'
...> AND t1.languageId in (1, 2)
...> AND (t1.languageId = 1 or t3.cnt = 1);
10|1|term 10 lang 1|2|10
11|1|term 11 lang 1|1|11
12|2|term 12 lang 2|1|12
13|2|term 13 lang 2|1|13
sqlite>

passing list of name/value pairs to stored procedure

I have a name/value pair in a List<T> and needing to find the best way to pass these to a stored procedure.
Id Name
1 abc
2 bbc
3 cnn
....
...
What is the best way to accomplish this?
One way to handle this in SQL Server 2005 (prior to the availability of table valued parameters) was to pass a delimited list and use a Split function. If you are using a two-column array, you would want to use two different delimiters:
Declare #Values varchar(max)
Set #Values = '1,abc|2,bbc|3,cnn'
With SplitItems As
(
Select S.Value As [Key]
, S2.Value
, Row_Number() Over ( Partition By S.Position Order By S2.Position ) As ElementNum
From dbo.Split(#Values,'|') As S
Outer Apply dbo.Split(S.Value, ',') As S2
)
Select [Key]
, Min( Case When S.ElementNum = 1 Then S.Value End ) As ListKey
, Min( Case When S.ElementNum = 2 Then S.Value End ) As ListValue
From SplitItems As S
Group By [Key]
Create Function [dbo].[Split]
(
#DelimitedList nvarchar(max)
, #Delimiter nvarchar(2) = ','
)
RETURNS TABLE
AS
RETURN
(
With CorrectedList As
(
Select Case When Left(#DelimitedList, Len(#Delimiter)) <> #Delimiter Then #Delimiter Else '' End
+ #DelimitedList
+ Case When Right(#DelimitedList, Len(#Delimiter)) <> #Delimiter Then #Delimiter Else '' End
As List
, Len(#Delimiter) As DelimiterLen
)
, Numbers As
(
Select Row_Number() Over ( Order By c1.object_id ) As Value
From sys.columns As c1
Cross Join sys.columns As c2
)
Select CharIndex(#Delimiter, CL.list, N.Value) + CL.DelimiterLen As Position
, Substring (
CL.List
, CharIndex(#Delimiter, CL.list, N.Value) + CL.DelimiterLen
, CharIndex(#Delimiter, CL.list, N.Value + 1)
- ( CharIndex(#Delimiter, CL.list, N.Value) + CL.DelimiterLen )
) As Value
From CorrectedList As CL
Cross Join Numbers As N
Where N.Value < Len(CL.List)
And Substring(CL.List, N.Value, CL.DelimiterLen) = #Delimiter
)
Another way to handle this without table-valued parameters is to pass Xml as an nvarchar(max):
Declare #Values nvarchar(max)
Set #Values = '<root><Item Key="1" Value="abc"/>
<Item Key="2" Value="bbc"/>
<Item Key="3" Value="cnn"/></root>'
Declare #docHandle int
exec sp_xml_preparedocument #docHandle output, #Values
Select *
From OpenXml(#docHandle, N'/root/Item', 1)
With( [Key] int, Value varchar(10) )
Take a look at Arrays and Lists in SQL Server 2008 to get some ideas
SQL Server 2008 also supports this multi row values syntax
create table #bla (id int, somename varchar(50))
insert #bla values(1,'test1'),(2,'Test2')
select * from #bla
i endup using foreach <insert>
This could done through three ways.
User Defined Table Type
Json Object Parsing
XML Parsing
I tried with the first option and passed a list of pairs in User Defined Table Type. This works for me. I am posting here, it might help someone else.
The first challenge for me was to pass the list of key value pair data structure and second to loop through the list and insert the record in a table.
Step 1 : Create a User Defined Table Type. I have created with a name 'TypeMetadata'. As it is custom type, I created two attributes of type nvarchar. You can create one of type integer and second of type nvarchar.
-- Type: metadata ---
IF EXISTS(SELECT * FROM SYS.TYPES WHERE NAME = 'TypeMetadata')
DROP TYPE TypeMetadata
GO
CREATE TYPE TypeMetadata AS TABLE (
mkey nvarchar (50),
mvalue nvarchar (50)
);
GO
Step 2 : Then I created a stored procedure with name 'createfiled'
-- Procedure: createtext --
CREATE PROCEDURE [dbo].[createfield]
#name nvarchar(50),
#text nvarchar(50),
#order int,
#type nvarchar(50),
#column_id int ,
#tid int,
#metadataList TypeMetadata readonly
AS
BEGIN
--loop through metadata and insert records --
DECLARE #mkey nvarchar(max);
DECLARE #mvalue nvarchar(max);
DECLARE mCursor CURSOR LOCAL FAST_FORWARD
FOR
SELECT mkey, mvalue
FROM #metadataList;
OPEN mCursor;
FETCH NEXT FROM mCursor INTO #mkey, #mvalue; -- Initial fetch attempt
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO template_field_metadata (name, value, template_field_id, isProperty) values (#mkey, #mvalue, 1, 0)
PRINT 'A new metadata created with id : ' + cast(SCOPE_IDENTITY() as nvarchar);
FETCH NEXT FROM mCursor INTO #mkey, #mvalue; -- Attempt to fetch next row from cursor
END;
CLOSE mCursor;
DEALLOCATE mCursor;
END
GO
Step 3: finally I executed the stored procedure like;
DECLARE #metadataToInsert TypeMetadata;
INSERT INTO #metadataToInsert VALUES ('value', 'callVariable2');
INSERT INTO #metadataToInsert VALUES ('maxlength', '30');
DECLARE #fid INT;
EXEC [dbo].[createfield] #name = 'prefagent', #text = 'Pref Agent', #order = 1 , #type= 'prefagent', #column_id = 0, #tid = 49, #metadataList =#metadataToInsert;

Resources