How to create a channelpath in BigQuery based on events? - google-analytics

I want to create channelpaths in Bigquery on a user level. I want the path to end when a transaction occurs. The next visit will then start a new path. Currently I have one path per user summing all the transactions. See the provided code below. I've also included the current OUTPUT TABLE and the desired OUTPUT TABLE.
My idea would be to create a new column that is counting the transactions. This value would start at 0 and need to be incremented by 1 AFTER a transaction occured. Then I would merge this value with the user_id value and group the aggregated strings on that variable. But I dont know how to do this.
Thanks in advance!
Guido
#standardSQL
WITH yourTable AS (
SELECT 1 AS user_id,'1a' as visit_id, '2017-01-01 14:10:12' AS DATETIME,
'google cpc' AS channelgrouping, 0 AS transaction , 1 as visit UNION ALL
SELECT 1, '1b', '2017-01-01 20:10:12', 'email', 1, 1 UNION ALL
SELECT 1, '1c','2017-01-03 08:10:12', 'direct', 0, 1 UNION ALL
SELECT 1, '1d','2017-01-04 13:10:14', 'organic', 1, 1
)
SELECT
user_id,
STRING_AGG(channelgrouping, ' > ' ORDER BY DATETIME) AS channelgrouping_path,
SUM(transaction) AS transaction,
SUM(visit) AS visits
FROM yourTable
GROUP BY user_id
OUTPUT TABLE
user_id|channgelgrouping_path |Transactions|Visits
1 |google cpc > email > direct > organic| 2 | 4
DESIRED OUTPUT TABLE
user_id|channgelgrouping_path |Transactions|Visits
1 |google cpc > email | 1 | 2
1 |direct > organic | 1 | 2

Try below
#standardSQL
WITH yourTable AS (
SELECT 1 AS user_id,'1a' AS visit_id, '2017-01-01 14:10:12' AS DATETIME,
'google cpc' AS channelgrouping, 0 AS transaction , 1 AS visit UNION ALL
SELECT 1, '1b', '2017-01-01 20:10:12', 'email', 1, 1 UNION ALL
SELECT 1, '1c','2017-01-03 08:10:12', 'direct', 0, 1 UNION ALL
SELECT 1, '1d','2017-01-04 13:10:14', 'organic', 1, 1
)
SELECT
user_id,
STRING_AGG(channelgrouping, ' > ' ORDER BY DATETIME) AS channelgrouping_path,
SUM(transaction) AS transaction,
SUM(visit) AS visits
FROM (
SELECT
*,
SUM(transaction) OVER(PARTITION BY user_id ORDER BY datetime
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS grp
FROM yourTable
)
GROUP BY user_id, IFNULL(grp, 0)

Related

Select Rows by Consecutive Dates in SQLite

I have a table with data like below:
Log Table:
User Id
Login Date
1
2022-01-03
1
2022-01-04
1
2022-01-10
1
2022-01-11
1
2022-01-12
1
2022-01-23
1
2022-01-25
1
2022-01-26
1
2022-01-27
1
2022-01-28
What I'm trying to do is to create a query that return rows of the latest logins by consecutive dates with var_date as parameter.
If var_date is 2022-01-29, then the result is:
User Id
Login Date
1
2022-01-25
1
2022-01-26
1
2022-01-27
1
2022-01-28
If var_date is 2022-01-30, then no result is returned, since 2022-01-29 is not in the table.
If var_date is 2022-01-24, then the query will return row with 2022-01-23 as login date.
How am I to do this in SQLite?
Thank you.
This question is a variant of gaps and islands, with the islands being clusters of records per user with continuous dates. Here is one approach using analytic functions:
WITH cte AS (
SELECT *, CASE WHEN julianday(LoginDate) -
julianday(LAG(LoginDate) OVER (PARTITION BY UserID
ORDER BY LoginDate))
> 1 THEN 1 ELSE 0 END AS counter
FROM yourTable
),
cte2 AS (
SELECT *, SUM(counter) OVER (PARTITION BY UserID ORDER BY LoginDate) AS grp
FROM cte
)
SELECT UserID, LoginDate
FROM cte2 t1
WHERE LoginDate < '2022-01-29' AND
grp = (SELECT t2.grp FROM cte2 t2
WHERE t2.UserID = t1.UserID AND t2.LoginDate = '2022-01-28');
Demo
The two CTEs generate a pseudo date group for each cluster per user. The final query returns all records less than the target date for which the group value is the same as the immediately preceding date. Hence, for dates having no immediate record for a given user, the query will return empty set.
Use a recursive CTE:
WITH cte(UserId, LoginDate) AS (
SELECT :var_user_id, :var_date
UNION ALL
SELECT UserId, date(c.LoginDate, '-1 day')
FROM cte c
WHERE EXISTS (SELECT 1 FROM tablename t WHERE t.UserId = c.UserId AND t.LoginDate = date(c.LoginDate, '-1 day'))
)
SELECT *
FROM cte
WHERE LoginDate < (SELECT MAX(LoginDate) FROM cte);
Change :var_user_id and :var_date to the values that you want for the user's id and the date.
See the demo.

Selecting the n'th range/island of rows where columns have a common value?

I need to select all rows (for a range) which have a common value within a column.
For example (starting from the last row)
I try to select all of the rows where _user_id == 1 until _user_id != 1 ?
In this case resulting in selecting rows [4, 5, 6]
+------------------------+
| _id _user_id amount |
+------------------------+
| 1 1 777 |
| 2 2 1 |
| 3 2 11 |
| 4 1 10 |
| 5 1 100 |
| 6 1 101 |
+------------------------+
/*Create the table*/
CREATE TABLE IF NOT EXISTS t1 (
_id INTEGER PRIMARY KEY AUTOINCREMENT,
_user_id INTEGER,
amount INTEGER);
/*Add the datas*/
INSERT INTO t1 VALUES(1, 1, 777);
INSERT INTO t1 VALUES(2, 2, 1);
INSERT INTO t1 VALUES(3, 2, 11);
INSERT INTO t1 VALUES(4, 1, 10);
INSERT INTO t1 VALUES(5, 1, 100);
INSERT INTO t1 VALUES(6, 1, 101);
/*Check the datas*/
SELECT * FROM t1;
1|1|777
2|2|1
3|2|11
4|1|10
5|1|100
6|1|101
In my attempt I use Common Table Expressions to group the results of _user_id. This gives the index of the last row containing a unique value (eg. SELECT _id FROM t1 GROUP BY _user_id LIMIT 2; will produce: [6, 3])
I then use those two values to select a range where LIMIT 1 OFFSET 1 is the lower end (3) and LIMIT 1 is the upper end (6)
WITH test AS (
SELECT _id FROM t1 GROUP BY _user_id LIMIT 2
) SELECT * FROM t1 WHERE _id BETWEEN 1+ (
SELECT * FROM test LIMIT 1 OFFSET 1
) and (
SELECT * FROM test LIMIT 1
);
Output:
4|1|10
5|1|100
6|1|101
This appears to work ok at selecting the last "island" but what I really need is a way to select the n'th island.
Is there a way to generate a query capable of producing outputs like these when provided a parameter n?:
island (n=1):
4|1|10
5|1|100
6|1|101
island (n=2):
2|2|1
3|2|11
island (n=3):
1|1|777
Thanks!
SQL tables are unordered, so the only way to search for islands is to search for consecutive _id values:
WITH RECURSIVE t1_with_islands(_id, _user_id, amount, island_number) AS (
SELECT _id,
_user_id,
amount,
1
FROM t1
WHERE _id = (SELECT max(_id)
FROM t1)
UNION ALL
SELECT t1._id,
t1._user_id,
t1.amount,
CASE WHEN t1._user_id = t1_with_islands._user_id
THEN island_number
ELSE island_number + 1
END
FROM t1
JOIN t1_with_islands ON t1._id = (SELECT max(_id)
FROM t1
WHERE _id < t1_with_islands._id)
)
SELECT *
FROM t1_with_islands
ORDER BY _id;

Using more than one field with IN ( ) for a sub-query

In Google BigQuery, I would have to do something like:
SELECT hits.item.productName
FROM [‘Dataset Name’ ]
WHERE date, visitId, fullVisitorId IN (
SELECT date, visitId, fullVisitorId
FROM [‘Dataset Name’ ]
WHERE hits.item.productName CONTAINS 'Product Item Name A'
AND totals.transactions>=1)
However, this does not seem to be supported. What alternatives do I have besides using a JOIN?
Do a JOIN instead.
The equivalent of:
SELECT COUNT(*), stn, a.wban, FIRST(name) name, FIRST(country) country
FROM [fh-bigquery:weather_gsod.gsod2014] a
WHERE stn, wban IN
(SELECT usaf, wban FROM [fh-bigquery:weather_gsod.stations] WHERE country='UK')
GROUP BY 2, 3
ORDER BY 1 DESC
Would be:
SELECT COUNT(*), stn, a.wban, FIRST(name) name, FIRST(country) country
FROM [fh-bigquery:weather_gsod.gsod2014] a
JOIN [fh-bigquery:weather_gsod.stations] b
ON a.stn=b.usaf AND a.wban=b.wban
WHERE country='UK'
GROUP BY 2, 3
ORDER BY 1 DESC

Increase row number every time a flag changes

I have gone through a similar post in Stack overflow...
but my query is :
If my table generates a flag in run time execution,then how can I increase Grp_number(generate run time) every time my flag changes.
my Oracle query:
Select emp_id,
Case when MOD(rownum/3)=1 and rownum>1 then 'Y' else 'N' as flag
from Transaction_table
Desired o/p Data format:
emp_id Flag GRP_number
1 N 1
2 N 1
3 N 1
4 Y 2
5 N 2
6 N 2
7 Y 3
You cannot reference a column in another column in the same select list. You need to use sub query to avoid INVALID IDENTIFIER error.
Do it like -
WITH DATA AS(
SELECT emp_id,
CASE
WHEN MOD(rownum/3)=1
AND rownum >1
THEN 'Y'
ELSE 'N' AS flag
FROM Transaction_table
)
SELECT emp_id, flag, SUM(gap) over (PARTITION BY person
ORDER BY DAY) grp
FROM(
SELECT emp_id, flag,
CASE WHEN flag = lag(flag) over (PARTITION BY person
ORDER BY DAY)
THEN 0
ELSE 1
END gap
FROM DATA)

Get the most recent record for each user where value is 'K', action id is null or its state is 1

I have the following tables in SQL Server:
user_id, value, date, action_id
----------------------------------
1 A 1/3/2012 null
1 K 1/4/2012 null
1 B 1/5/2012 null
2 X 1/3/2012 null
2 K 1/4/2012 1
3 K 1/3/2012 null
3 L 1/4/2012 2
3 K 1/5/2012 3
4 K 1/3/2012 null
action_id, state
----------------------------------
1 0
2 1
3 1
4 0
5 1
I need to return the most recent record for each user where the value is 'K', the action id is either null or its state is set to 1. Here's the result set I want:
user_id, value, date, action_id
----------------------------------
3 K 1/5/2012 3
4 K 1/3/2012 null
For user_id 1, the most recent value is B and its action id is null, so I consider this the most recent record, but it's value is not K.
For user_id 2, the most recent value is K, but action id 1 has state 0, so I fallback to X, but X is not K.
user_id 3 and 4 are straightforward.
I'm interested in Linq to SQL query in ASP.NET, but for now T-SQL is fine too.
The SQL query would be :
Select Top 1 T1.* from Table1 T1
LEFT JOIN Table2 T2
ON T1.action_id = T2.action_id
Where T1.Value = 'K' AND (T1.action_id is null or T2.state = 1)
Order by T1.date desc
LINQ Query :
var result = context.Table1.Where(T1=> T1.Value == "K"
&& (T1.action_id == null ||
context.Table2
.Where(T2=>T2.State == 1)
.Select(T2 => T2.action_id).Contains(T1.action_id)))
.OrderByDescending(T => T.date)
.FirstOrDefault();
Good Luck !!
This query will return desired result set:
SELECT
*
FROM
(
SELECT
user_id
,value
,date
,action_id
,ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) RowNum
FROM
testtable
WHERE
value = 'K'
) testtable
WHERE
RowNum = 1
You can also try following approach if user_id and date combination is unique
Make sure to get the order of predicates in the join to be able to use indexes:
SELECT
testtable.*
FROM
(
SELECT
user_id
,MAX(date) LastDate
FROM
testtable
WHERE
value = 'K'
GROUP BY
user_id
) tblLastValue
INNER JOIN
testtable
ON
testtable.user_id = tblLastValue.user_id
AND
testtable.date = tblLastValue.LastDate
This would select the top entries for all users as described in your specification, as opposed to TOP 1 which just selects the most recent entry in the database. I'm assuming here that your tables are named users and actions:
WITH usersactions as
(SELECT
u.user_id,
u.value,
u.date,
u.action_id,
ROW NUMBER() OVER (PARTITION BY u.user_id ORDER BY u.date DESC, u.action_id DESC) as row
FROM users u
LEFT OUTER JOIN actions a ON u.action_id = a.action_id
WHERE
u.value = 'K' AND
(u.action_id IS NULL OR a.state = 1)
)
SELECT * FROM usersactions WHERE row = 1
Or if you don't want to use a CTE:
SELECT * FROM
(SELECT
u.user_id,
u.value,
u.date,
u.action_id,
ROW NUMBER() OVER (PARTITION BY u.user_id ORDER BY u.date DESC, u.action_id DESC) as row
FROM users u
LEFT OUTER JOIN actions a ON u.action_id = a.action_id
WHERE
u.value = 'K' AND
(u.action_id IS NULL OR a.state = 1)
) useractions
WHERE row = 1

Resources