ga:itemQuantity in ga_sessions_YYYMMDD (Big Query) - google-analytics

I'm trying to replicate the GA Quantity metric (ga:itemQuantity) using standardSQL and querying the GA export to BigQuery date partitioned tables (ga_sessions_YYYYMMDD).
I have tried the following, but 'quantity' is always null:
#standardSQL
SELECT
sum(hit.item.itemQuantity) as quantity
FROM `precise-armor-133520.1500218.ga_sessions_20170801` t
CROSS JOIN
UNNEST(t.hits) AS hit
order by 1 ASC;
Other metrics work and match 100% with the GA UI so I am assuming it's not a data export problem. For example:
SELECT
sum( totals.totalTransactionRevenue ) as revenue, sum( totals.transactions ) as transactions
FROM `precise-armor-133520.1500218.ga_sessions_201708*` t
CROSS JOIN
UNNEST(t.hits) AS hit
group by `date`
order by `date` asc
These totals match Revenue and Transactions (metrics) in GA UI respectively.
What is the standardSQL query for the GA metric quantity (ga:itemQuantity)?

In order to match "Quantity" in GA's web UI by each date, use the following standard SQL:
SELECT
SUM(product.productQuantity)
,`date`
FROM
`precise-armor-133520.1500218.ga_sessions_*`
,UNNEST(hits) AS hits
,UNNEST(hits.product) AS product
WHERE hits.eCommerceAction.action_type = "6"
and _TABLE_SUFFIX between '20170801' and FORMAT_DATE("%Y%m%d", CURRENT_DATE)
group by 2
order by 2 asc

Does this work?
#standardSQL
SELECT
sku,
SUM(qtd) qtd
FROM(
SELECT
ARRAY(SELECT AS STRUCT productSKU sku, productQuantity qtd FROM UNNEST(hits), UNNEST(product) WHERE ecommerceAction.action_type = '6') data
FROM `precise-armor-133520.1500218.ga_sessions_20170801`
),
UNNEST(data)
GROUP BY sku
ORDER BY qtd DESC
LIMIT 1000
Not sure how you managed to unnest the product fields, maybe this solves your issue.

Related

Google Analytics BigQuery Export Event Count Issue

I'm trying to get a total events count for a particular event in BigQuery along with a custom dimension for versions of the site.
This query works perfect but does not include my custom dimension:
SELECT
hits.eventInfo.eventCategory AS eventCategory,
COUNT(*) AS total_events
FROM `ga_sessions_*`,
UNNEST(hits) AS hits
WHERE _TABLE_SUFFIX = '20200630'
AND totals.visits = 1
AND hits.type = 'EVENT'
AND hits.eventInfo.eventCategory = 'Cart CTA'
GROUP BY
hits.eventInfo.eventCategory
But when I add the UNNEST for customDimensions, I get a total that is twice the correct total.
SELECT
hits.eventInfo.eventCategory AS eventCategory,
COUNT(*) AS total_events
FROM `ga_sessions_*`,
UNNEST(hits) AS hits,
UNNEST(hits.customDimensions) AS cd
WHERE _TABLE_SUFFIX = '20200630'
AND totals.visits = 1
AND hits.type = 'EVENT'
AND hits.eventInfo.eventCategory = 'Cart CTA'
GROUP BY
hits.eventInfo.eventCategory
I think there is something wrong with customDimension unnest, but I don't know how to solve. I've tried using LEFT JOIN with the UNNEST but I get the same result.
I think there is something wrong with customDimension unnest
This is by design! When you do UNNEST for the row that has N records in that unnested column - you actually generate N rows in place of that one row. So, obviously COUNT(*) will be different ...
I don't know how to solve
... unless you filter by specific value of that unnested field

SQLite Nested Query for maximum

I'm trying to use DB Browser for SQLite to construct a nested query to determine the SECOND highest priced item purchased by the top 10 spenders. The query I have to pick out the top 10 spenders is:
SELECT user_id, max(item_total), SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders AS o
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10
This gives the user_id, most expensive item they purchased (not counting shipping or discounts) as well as the total amount they spent on the site.
I was trying to use a nested query to generate a list of the second most expensive items they purchased, but keep getting errors. I've tried
SELECT user_id, MAX(item_total) AS second_highest
FROM orders
WHERE item_total < (SELECT user_id, SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10)
group by user_id
I keep getting a row value misused error. Does anyone have pointers on this nested query or know of another way to find the second highest item purchased from within the group found in the first query?
Thanks!
(Note: The following assumes you're using Sqlite 3.25 or newer since it uses window functions).
This will return the second-largest item_total for each user_id without duplicates:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders)
SELECT user_id, item_total FROM ranked WHERE ranking = 2;
You can combine it with your original query with something like:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders),
totals AS
(SELECT user_id
, sum (item_total + shipping_cost - discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = 0
GROUP BY user_id)
SELECT t.user_id, r.item_total, t.total_spent
FROM totals AS t
JOIN ranked AS r ON t.user_id = r.user_id
WHERE r.ranking = 2
ORDER BY t.total_spent DESC, t.user_id
LIMIT 10;
Okay, after fixing your table definition to better reflect the values being stored in it and the stated problem, and fixing the data and adding to it so you can actually get results, plus an optional but useful index like so:
CREATE TABLE orders (order_id INTEGER PRIMARY KEY
, user_id INTEGER
, item_total REAL
, shipping_cost NUMERIC
, discounts_applied NUMERIC
, payment_reject INTEGER);
INSERT INTO orders(user_id, item_total, shipping_cost, discounts_applied
, payment_reject) VALUES (9852,60.69,10,0,FALSE),
(2784,123.91,15,0,FALSE), (1619,119.75,15,0,FALSE), (9725,151.92,15,0,FALSE),
(8892,153.27,15,0,FALSE), (7105,156.86,25,0,FALSE), (4345,136.09,15,0,FALSE),
(7779,134.93,15,0,FALSE), (3874,157.27,15,0,FALSE), (5102,108.3,10,0,FALSE),
(3098,59.97,10,0,FALSE), (6584,124.92,15,0,FALSE), (5136,111.06,10,0,FALSE),
(1869,113.44,20,0,FALSE), (3830,129.63,15,0,FALSE), (9852,70.69,10,0,FALSE),
(2784,134.91,15,0,FALSE), (1619,129.75,15,0,FALSE), (9725,161.92,15,0,FALSE),
(8892,163.27,15,0,FALSE), (7105,166.86,25,0,FALSE), (4345,146.09,15,0,FALSE),
(7779,144.93,15,0,FALSE), (3874,167.27,15,0,FALSE), (5102,118.3,10,0,FALSE),
(3098,69.97,10,0,FALSE), (6584,134.92,15,0,FALSE), (5136,121.06,10,0,FALSE),
(1869,123.44,20,0,FALSE), (3830,139.63,15,0,FALSE);
CREATE INDEX orders_idx_1 ON orders(user_id, item_total DESC);
the above query will give:
user_id item_total total_spent
---------- ---------- -----------
7105 156.86 373.72
3874 157.27 354.54
8892 153.27 346.54
9725 151.92 343.84
4345 136.09 312.18
7779 134.93 309.86
3830 129.63 299.26
6584 124.92 289.84
2784 123.91 288.82
1619 119.75 279.5
(If you get a syntax error from the query now, it's because you're using an old version of sqlite that doesn't support window functions.)

Average Analytics Function On GA360 Visit ID for LatencyTracking

Looking to get the average latencyTracking for a visitid out of our GA 360 export.
Setup the following query but getting the following error and I'm not sure why since all these are all aggregate functions: SELECT list expression references hits.latencyTracking.serverResponseTime which is neither grouped nor aggregated at [3:5]
select
TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
AVG(hits.latencyTracking.serverResponseTime) OVER (PARTITION BY visitid) as avgServerResponseTime,
AVG(hits.latencyTracking.serverConnectionTime) OVER (PARTITION BY visitid) as avgServerConnectionTime,
AVG(hits.latencyTracking.domInteractiveTime) OVER (PARTITION BY visitid) as avgdomInteractiveTime,
AVG(hits.latencyTracking.pageLoadTime) OVER (PARTITION BY visitid) as avgpageLoadTime
from `xxx.xxx.ga_sessions_2018*`,
UNNEST(hits) AS hits
where hits.latencyTracking.serverResponseTime is not null
group by visitStartTime
The way your query written - AVG() is not just Aggregate Function but rather Aggregate Analytic Function.
To make it work you can remove OVER() so AVG() will really become aggregate function here corresponding to GROUP BY
select
TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
AVG(hits.latencyTracking.serverResponseTime) as avgServerResponseTime,
AVG(hits.latencyTracking.serverConnectionTime) as avgServerConnectionTime,
AVG(hits.latencyTracking.domInteractiveTime) as avgdomInteractiveTime,
AVG(hits.latencyTracking.pageLoadTime) as avgpageLoadTime
from `xxx.xxx.ga_sessions_2018*`,
UNNEST(hits) AS hits
where hits.latencyTracking.serverResponseTime is not null
group by visitStartTime
Having windows and group by in conjunction can be confusing.
In your case it is not even necessary, neither is the flattening - you can write simple subqueries to get your numbers per session:
SELECT
TIMESTAMP_SECONDS(visitStartTime) AS visitStartTime,
(
SELECT AVG(latencyTracking.serverResponseTime)
FROM t.hits
WHERE latencyTracking.serverResponseTime IS NOT NULL) AS avgServerResponseTime,
(
SELECT AVG(latencyTracking.serverConnectionTime)
FROM t.hits
WHERE latencyTracking.serverConnectionTime IS NOT NULL) AS avgServerConnectionTime,
(
SELECT AVG(latencyTracking.domInteractiveTime)
FROM t.hits
WHERE latencyTracking.domInteractiveTime IS NOT NULL ) AS avgdomInteractiveTime,
(
SELECT AVG(latencyTracking.pageLoadTime)
FROM t.hits
WHERE latencyTracking.pageLoadTime IS NOT NULL ) AS avgpageLoadTime
FROM `xxx.xxx.ga_sessions_2018*`
It also doesn't involve grouping which makes it faster.

Unnesting the custom dimensions is duplicating/inflating transaction revenue in BigQuery

Unnesting hits.customdimension and hits.product.customdimension is inflating the transaction revenue
SELECT
sum(totals.totalTransactionRevenue)/1000000 as revenue,
(SELECT MAX(IF(index=10,value,NULL)) FROM UNNEST(product.customDimensions)) AS product_CD10,
(SELECT MAX(IF(index=1,value,NULL)) FROM UNNEST(hits.customDimensions)) AS CD1
FROM
`XXXXXXXXXXXXXXX.ga_sessions_*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) as product
WHERE
_TABLE_SUFFIX BETWEEN "20180608"
AND "20180608"
group by product_CD10,CD1
Is there a way I could get a flat table in such a way that if I apply sum of revenue, its should give the correct result.
Move your UNNEST() to the top sub-queries - then the rows won't get duplicated:
SELECT row
, (SELECT MAX(letter) FROM UNNEST(row), UNNEST(qq)) max_letter
, (SELECT MAX(n) FROM UNNEST(row), UNNEST(qq), UNNEST(qb) n) max_number
FROM (
SELECT [
STRUCT(1 AS p,[STRUCT('a' AS letter, [4,5,6] AS qb)] AS qq)
, STRUCT(2,[STRUCT('b', [7,8,9])])
, STRUCT(3,[STRUCT('c', [10,11,12])])
] AS row
)
Haven't tested this tho:
SELECT
sum(totals.totalTransactionRevenue)/1000000 as revenue,
(SELECT MAX(IF(index=10,value,NULL)) FROM UNNEST(hits) AS hit, UNNEST(hit.products) product, UNNEST(product.customDimensions)) AS product_CD10,
(SELECT MAX(IF(index=1,value,NULL)) FROM UNNEST(hits) AS hit, UNNEST(hit.customDimensions)) AS CD1
FROM `XXXXXXXXXXXXXXX.ga_sessions_*`,
WHERE _TABLE_SUFFIX BETWEEN "20180608" AND "20180608"
group by product_CD10,CD1

Advice on SQL-Server Query

I'm looking to improve this query I wrote for a small web application in ASP.NET 4.0 using SQL-Server 2005. This application will allow the user to search by Product ID and have it return the following information:
Highest Purchase Price + Most Recent Date of purchase # this price
Lowest Purchase Price + Most Recent Date of purchase # this price
Most Recent Purchase Price + Date
Average Purchase Price (optional, i thought this might improve the usefulness of the app)
Here is the structure of the Products table (I'm only including relevant columns, this is a DB already in production and these are non-pk columns)
product_id (nvarchar(20))
price (decimal(19,2))
pDate (datetime)
Before I put down the query I have so far I just want to say that I can get this information easily through multiple queries, so if this is the best practice then disregard improving the query, but I was aiming to minimize the number of queries needed to get all needed information.
What I have so far: (Note: There are rows with price = 0 so I ignored those in the bottom select looking for the MIN price)
SELECT price, MAX(pDate)
FROM Products
WHERE product_id = #product_id AND
(price = (SELECT MAX(price)
FROM Products
WHERE product_id =#product_id) OR
price = (SELECT MIN(price)
FROM Products
WHERE product_id = #product_id AND price > 0))
GROUP BY price
Now this is returning 2 rows:
first = the lowest price + date
second row = high price + date
What I would like ideally is to have a query return 1 row with all the needed information stated above if possible, as it would simplify displaying the information in ASP for me. And like I said earlier, if multiple queries is the be approach then no need to re-write a complex query here.
Edit
Here is some sample data
Desired query results: (ignore the format as I typed this in excel)
Here is the query I will be using thanks to Ken Benson:
SELECT TOP 1 prod.product_id,
minp.price AS minprice, minp.pDate as minlastdate,
maxp.price AS maxprice, maxp.pDate as maxlastdate,
ag.price AS averageprice
FROM products AS prod
LEFT JOIN (SELECT lmd.product_id,max(lmd.pDate) as pDate,mn.price FROM products as lmd INNER JOIN
(SELECT product_id, min(price) AS price from products WHERE price > 0 group by product_id) as mn ON lmd.product_id=mn.product_id AND lmd.price=mn.price
group by lmd.product_id,mn.price ) AS minp ON minp.product_id=prod.product_id
LEFT JOIN (SELECT lxd.product_id,max(lxd.pDate) as pDate,mx.price FROM products as lxd INNER JOIN
(SELECT product_id, max(price) AS price from products group by product_id) as mx ON lxd.product_id=mx.product_id AND lxd.price=mx.price
group by lxd.product_id,mx.price ) AS maxp ON maxp.product_id=prod.product_id
LEFT JOIN (SELECT product_id,avg(price) as price FROM products WHERE price > 0 GROUP BY product_id) AS ag ON ag.product_id=prod.product_id
WHERE prod.product_id=#product_id
I think you can do a couple of joins back to the table ...
Select product_id, min.price, min.pDate, max.price, max.pDate
FROM products as p
LEFT JOIN (Select Min(price), pDate, product_id FROM products GROUP BY product_id)
as min on min.product_id=p.product_id
LEFT JOIN (Select max(price), pDate, product_id FROM products GROUP BY product_id)
as max on max.product_id=p.product_id
Where p.product_id = #product_id
This second bit of code should produce desired results....
SELECT prod.product_id,
minp.price AS minprice, minp.pDate as minlastdate,
maxp.price AS maxprice, maxp.pDate as maxlastdate,
ag.price AS averageprice
FROM products AS prod
LEFT JOIN (SELECT lmd.product_id,max(lmd.pDate) as pDate,mn.price FROM products as lmd INNER JOIN
(SELECT product_id, min(price) AS price from products group by product_id) as mn ON lmd.product_id=mn.product_id
group by lmd.product_id,mn.price ) AS minp ON minp.product_id=prod.product_id
LEFT JOIN (SELECT lxd.product_id,max(lxd.pDate) as pDate,mx.price FROM products as lxd INNER JOIN
(SELECT product_id, max(price) AS price from products group by product_id) as mx ON lxd.product_id=mx.product_id
group by lxd.product_id,mx.price ) AS maxp ON maxp.product_id=prod.product_id
LEFT JOIN (SELECT product_id,avg(price) as price FROM products GROUP BY product_id) AS ag ON ag.product_id=prod.product_id
WHERE prod.product_id=1
LIMIT 1
Yep - left out an 'and' condition:
SELECT TOP 1
prod.product_id,
minp.price AS minprice, minp.pDate as minlastdate,
maxp.price AS maxprice, maxp.pDate as maxlastdate,
ag.price AS averageprice
FROM products AS prod
LEFT JOIN (SELECT lmd.product_id,max(lmd.pDate) as pDate,mn.price FROM products as lmd INNER JOIN
(SELECT product_id, min(price) AS price from products group by product_id) as mn ON lmd.product_id=mn.product_id **AND lmd.price=mn.price**
group by lmd.product_id,mn.price ) AS minp ON minp.product_id=prod.product_id
LEFT JOIN (SELECT lxd.product_id,max(lxd.pDate) as pDate,mx.price FROM products as lxd INNER JOIN
(SELECT product_id, max(price) AS price from products group by product_id) as mx ON lxd.product_id=mx.product_id AND **lxd.price=mx.price**
group by lxd.product_id,mx.price ) AS maxp ON maxp.product_id=prod.product_id
LEFT JOIN (SELECT product_id,avg(price) as price FROM products GROUP BY product_id) AS ag ON ag.product_id=prod.product_id
WHERE prod.product_id=#product_id
I would do this with a combination of ranking functions and conditional aggregations:
select product_id,
max(case when seqnum_hi = 1 then price end) as highPrice,
max(case when seqnum_hi = 1 then pdate end) as highPrice_date
max(case when seqnum_low = 1 then price end) as lowPrice,
max(case when seqnum_low = 1 then pdate end) as lowPrice_date,
max(case when seqnum_rec = 1 then price end) as recentPrice,
max(case when seqnum_rec = 1 then pdate end) as recentPrice_date,
avg(price) as avg_price
from (select p.*,
row_number() over (partition by product_id order by price asc) as seqnum_low,
row_number() over (partition by product_id order by price desc) as seqnum_hi,
row_number() over (partition by product_id order by pdate desc) as seqnum_rec
from price
where product_id = #product_id
group by product_id
The seguence numbers identify the rows with the particular attributes you care about (high price, low price, most recent). The conditional max then just selects information from those rows.
The following should get what you want. It's pretty long, but is readable so should be easily modified by anyone who needs to:
;WITH CTE_MaxPrice AS
(
SELECT product_id, MAX(P.price) AS MaxPrice
FROM Products P
GROUP BY product_id
HAVING product_id = #product_id
),
CTE_MinPrice AS
(
SELECT product_id, MIN(P.price) AS MinPrice
FROM Products P
GROUP BY product_id
HAVING product_id = #product_id
),
CTE_MaxPriceDate AS
(
SELECT P.product_id, MAX(P.pDate) AS MaxDate
FROM Products P
INNER JOIN CTE_MaxPrice MaxP ON P.product_id = MaxP.product_id
AND P.price = MaxP.MaxPrice
GROUP BY P.product_id
),
CTE_MinPriceDate AS
(
SELECT P.product_id, MAX(P.pDate) AS MinDate
FROM Products P
INNER JOIN CTE_MinPrice MinP ON P.product_id = MinP.product_id
AND P.price = MinP.MinPrice
GROUP BY P.product_id
)
SELECT MaxP.MaxPrice, MaxPD.MaxDate,
MinP.MinPrice, MinPD.MinDate,
RP.price AS RecentPrice, MAX(RP.pDate) AS RecentDate,
AVG(AP.price) AS AveragePrice
FROM Products P
INNER JOIN CTE_MaxPrice MaxP ON P.product_id = MaxP.product_id
INNER JOIN CTE_MinPrice MinP ON P.product_id = MinP.product_id
AND MinP.MinPrice > 0
INNER JOIN CTE_MaxPriceDate MaxPD ON P.product_id = MaxPD.product_id
INNER JOIN CTE_MinPriceDate MinPD ON P.product_id = MinPD.product_id
INNER JOIN Products RP ON P.product_id = RP.product_id
INNER JOIN Products AP ON P.product_id = AP.product_id
GROUP BY MaxP.MaxPrice, MaxPD.MaxDate,
MinP.MinPrice, MinPD.MinDate, RP.price
HAVING P.product_id = #product_id
Well since there have been three attempts to answer, and none have worked quite how you want, I'll tell you how I would do it - and this assumes you can use a stored procedure and also assumes that the product table is not so huge that multiple seperate queries would be a problem:
CREATE PROCEDURE myproc AS
DECLARE #Price1 money
DECLARE #Date1 smalldatetime
DECLARE #Price2 money
DECLARE #Date2 smalldatetime
DECLARE #Price3 money
DECLARE #Date3 smalldatetime
DECLARE #Price4 money
SELECT #Price1 = MAX(Price) FROM Products
SELECT #Date1 = MAX(pDate) FROM Products WHERE Price=#Price1
SELECT #Price2 = Min(Price) FROM Products WHERE Price >0
SELECT #Date2 = MAX(pDate) FROM Products WHERE Price=#Price2
SELECT #Date3 = Max(pDate) FROM Products
SELECT #Price3 = MAX(Price) FROM Products WHERE pDate=#Date3 --max in case there are more than one purchases with the same date.
SELECT #Price4 = AVG(Price) FROM Products WHERE Price>0
SELECT #Price1 As MaxPrice,
#Date1 As MaxPriceDate,
#Price2 As LowPrice,
#Date2 As LowPriceDate,
#Price4 As AveragePrice,
#Price3 As RecentPrice,
#Price3 As RecentPriceDate
GO
Forgive any typographical errors, I didn't test this, but if you can use stored procedures, this will work.
So this is not much different than doing your multiple queries from the client, but should perform better putting them all into a single SP. You could also cut the number of queries down a bit by using some of the code from your other answers, but I have left it this way for clarity.

Resources