I'm looking to improve this query I wrote for a small web application in ASP.NET 4.0 using SQL-Server 2005. This application will allow the user to search by Product ID and have it return the following information:
Highest Purchase Price + Most Recent Date of purchase # this price
Lowest Purchase Price + Most Recent Date of purchase # this price
Most Recent Purchase Price + Date
Average Purchase Price (optional, i thought this might improve the usefulness of the app)
Here is the structure of the Products table (I'm only including relevant columns, this is a DB already in production and these are non-pk columns)
product_id (nvarchar(20))
price (decimal(19,2))
pDate (datetime)
Before I put down the query I have so far I just want to say that I can get this information easily through multiple queries, so if this is the best practice then disregard improving the query, but I was aiming to minimize the number of queries needed to get all needed information.
What I have so far: (Note: There are rows with price = 0 so I ignored those in the bottom select looking for the MIN price)
SELECT price, MAX(pDate)
FROM Products
WHERE product_id = #product_id AND
(price = (SELECT MAX(price)
FROM Products
WHERE product_id =#product_id) OR
price = (SELECT MIN(price)
FROM Products
WHERE product_id = #product_id AND price > 0))
GROUP BY price
Now this is returning 2 rows:
first = the lowest price + date
second row = high price + date
What I would like ideally is to have a query return 1 row with all the needed information stated above if possible, as it would simplify displaying the information in ASP for me. And like I said earlier, if multiple queries is the be approach then no need to re-write a complex query here.
Edit
Here is some sample data
Desired query results: (ignore the format as I typed this in excel)
Here is the query I will be using thanks to Ken Benson:
SELECT TOP 1 prod.product_id,
minp.price AS minprice, minp.pDate as minlastdate,
maxp.price AS maxprice, maxp.pDate as maxlastdate,
ag.price AS averageprice
FROM products AS prod
LEFT JOIN (SELECT lmd.product_id,max(lmd.pDate) as pDate,mn.price FROM products as lmd INNER JOIN
(SELECT product_id, min(price) AS price from products WHERE price > 0 group by product_id) as mn ON lmd.product_id=mn.product_id AND lmd.price=mn.price
group by lmd.product_id,mn.price ) AS minp ON minp.product_id=prod.product_id
LEFT JOIN (SELECT lxd.product_id,max(lxd.pDate) as pDate,mx.price FROM products as lxd INNER JOIN
(SELECT product_id, max(price) AS price from products group by product_id) as mx ON lxd.product_id=mx.product_id AND lxd.price=mx.price
group by lxd.product_id,mx.price ) AS maxp ON maxp.product_id=prod.product_id
LEFT JOIN (SELECT product_id,avg(price) as price FROM products WHERE price > 0 GROUP BY product_id) AS ag ON ag.product_id=prod.product_id
WHERE prod.product_id=#product_id
I think you can do a couple of joins back to the table ...
Select product_id, min.price, min.pDate, max.price, max.pDate
FROM products as p
LEFT JOIN (Select Min(price), pDate, product_id FROM products GROUP BY product_id)
as min on min.product_id=p.product_id
LEFT JOIN (Select max(price), pDate, product_id FROM products GROUP BY product_id)
as max on max.product_id=p.product_id
Where p.product_id = #product_id
This second bit of code should produce desired results....
SELECT prod.product_id,
minp.price AS minprice, minp.pDate as minlastdate,
maxp.price AS maxprice, maxp.pDate as maxlastdate,
ag.price AS averageprice
FROM products AS prod
LEFT JOIN (SELECT lmd.product_id,max(lmd.pDate) as pDate,mn.price FROM products as lmd INNER JOIN
(SELECT product_id, min(price) AS price from products group by product_id) as mn ON lmd.product_id=mn.product_id
group by lmd.product_id,mn.price ) AS minp ON minp.product_id=prod.product_id
LEFT JOIN (SELECT lxd.product_id,max(lxd.pDate) as pDate,mx.price FROM products as lxd INNER JOIN
(SELECT product_id, max(price) AS price from products group by product_id) as mx ON lxd.product_id=mx.product_id
group by lxd.product_id,mx.price ) AS maxp ON maxp.product_id=prod.product_id
LEFT JOIN (SELECT product_id,avg(price) as price FROM products GROUP BY product_id) AS ag ON ag.product_id=prod.product_id
WHERE prod.product_id=1
LIMIT 1
Yep - left out an 'and' condition:
SELECT TOP 1
prod.product_id,
minp.price AS minprice, minp.pDate as minlastdate,
maxp.price AS maxprice, maxp.pDate as maxlastdate,
ag.price AS averageprice
FROM products AS prod
LEFT JOIN (SELECT lmd.product_id,max(lmd.pDate) as pDate,mn.price FROM products as lmd INNER JOIN
(SELECT product_id, min(price) AS price from products group by product_id) as mn ON lmd.product_id=mn.product_id **AND lmd.price=mn.price**
group by lmd.product_id,mn.price ) AS minp ON minp.product_id=prod.product_id
LEFT JOIN (SELECT lxd.product_id,max(lxd.pDate) as pDate,mx.price FROM products as lxd INNER JOIN
(SELECT product_id, max(price) AS price from products group by product_id) as mx ON lxd.product_id=mx.product_id AND **lxd.price=mx.price**
group by lxd.product_id,mx.price ) AS maxp ON maxp.product_id=prod.product_id
LEFT JOIN (SELECT product_id,avg(price) as price FROM products GROUP BY product_id) AS ag ON ag.product_id=prod.product_id
WHERE prod.product_id=#product_id
I would do this with a combination of ranking functions and conditional aggregations:
select product_id,
max(case when seqnum_hi = 1 then price end) as highPrice,
max(case when seqnum_hi = 1 then pdate end) as highPrice_date
max(case when seqnum_low = 1 then price end) as lowPrice,
max(case when seqnum_low = 1 then pdate end) as lowPrice_date,
max(case when seqnum_rec = 1 then price end) as recentPrice,
max(case when seqnum_rec = 1 then pdate end) as recentPrice_date,
avg(price) as avg_price
from (select p.*,
row_number() over (partition by product_id order by price asc) as seqnum_low,
row_number() over (partition by product_id order by price desc) as seqnum_hi,
row_number() over (partition by product_id order by pdate desc) as seqnum_rec
from price
where product_id = #product_id
group by product_id
The seguence numbers identify the rows with the particular attributes you care about (high price, low price, most recent). The conditional max then just selects information from those rows.
The following should get what you want. It's pretty long, but is readable so should be easily modified by anyone who needs to:
;WITH CTE_MaxPrice AS
(
SELECT product_id, MAX(P.price) AS MaxPrice
FROM Products P
GROUP BY product_id
HAVING product_id = #product_id
),
CTE_MinPrice AS
(
SELECT product_id, MIN(P.price) AS MinPrice
FROM Products P
GROUP BY product_id
HAVING product_id = #product_id
),
CTE_MaxPriceDate AS
(
SELECT P.product_id, MAX(P.pDate) AS MaxDate
FROM Products P
INNER JOIN CTE_MaxPrice MaxP ON P.product_id = MaxP.product_id
AND P.price = MaxP.MaxPrice
GROUP BY P.product_id
),
CTE_MinPriceDate AS
(
SELECT P.product_id, MAX(P.pDate) AS MinDate
FROM Products P
INNER JOIN CTE_MinPrice MinP ON P.product_id = MinP.product_id
AND P.price = MinP.MinPrice
GROUP BY P.product_id
)
SELECT MaxP.MaxPrice, MaxPD.MaxDate,
MinP.MinPrice, MinPD.MinDate,
RP.price AS RecentPrice, MAX(RP.pDate) AS RecentDate,
AVG(AP.price) AS AveragePrice
FROM Products P
INNER JOIN CTE_MaxPrice MaxP ON P.product_id = MaxP.product_id
INNER JOIN CTE_MinPrice MinP ON P.product_id = MinP.product_id
AND MinP.MinPrice > 0
INNER JOIN CTE_MaxPriceDate MaxPD ON P.product_id = MaxPD.product_id
INNER JOIN CTE_MinPriceDate MinPD ON P.product_id = MinPD.product_id
INNER JOIN Products RP ON P.product_id = RP.product_id
INNER JOIN Products AP ON P.product_id = AP.product_id
GROUP BY MaxP.MaxPrice, MaxPD.MaxDate,
MinP.MinPrice, MinPD.MinDate, RP.price
HAVING P.product_id = #product_id
Well since there have been three attempts to answer, and none have worked quite how you want, I'll tell you how I would do it - and this assumes you can use a stored procedure and also assumes that the product table is not so huge that multiple seperate queries would be a problem:
CREATE PROCEDURE myproc AS
DECLARE #Price1 money
DECLARE #Date1 smalldatetime
DECLARE #Price2 money
DECLARE #Date2 smalldatetime
DECLARE #Price3 money
DECLARE #Date3 smalldatetime
DECLARE #Price4 money
SELECT #Price1 = MAX(Price) FROM Products
SELECT #Date1 = MAX(pDate) FROM Products WHERE Price=#Price1
SELECT #Price2 = Min(Price) FROM Products WHERE Price >0
SELECT #Date2 = MAX(pDate) FROM Products WHERE Price=#Price2
SELECT #Date3 = Max(pDate) FROM Products
SELECT #Price3 = MAX(Price) FROM Products WHERE pDate=#Date3 --max in case there are more than one purchases with the same date.
SELECT #Price4 = AVG(Price) FROM Products WHERE Price>0
SELECT #Price1 As MaxPrice,
#Date1 As MaxPriceDate,
#Price2 As LowPrice,
#Date2 As LowPriceDate,
#Price4 As AveragePrice,
#Price3 As RecentPrice,
#Price3 As RecentPriceDate
GO
Forgive any typographical errors, I didn't test this, but if you can use stored procedures, this will work.
So this is not much different than doing your multiple queries from the client, but should perform better putting them all into a single SP. You could also cut the number of queries down a bit by using some of the code from your other answers, but I have left it this way for clarity.
Related
It is actually possible to use # (the at sign) with sqlite to be able to use a calculated value as a constant in an other query ?
I am using a variable(a total) that i calculated previously to get an other variable (a proportion) over two time periods.
Total amout of sale
Proportion of sale between the first semester and second semester.
I copy the first query to get the constant and i had the first query to the second.
The answer is no BUT:-
This could possibly be done in a single query.
Consider this simple demo with hopefully easy to understand all-in-one queries:-
First the sales table:-
i.e. 2 columns semester and amount
10 rows in total so 1000 is the total amount
6 rows are S1 (amount is 600) so 60%
4 rows are S2 (amount is 400) so 40%
Created and populated using:-
CREATE TABLE IF NOT EXISTS sales (semester TEXT, amount REAL);
INSERT INTO sales VALUES('S1',100),('S1',100),('S1',100),('S1',100),('S1',100),('S1',100),('S2',100),('S2',100),('S2',100),('S2',100);
So you could use an all-in-one query such as:-
SELECT
(SELECT sum(amount) FROM sales) AS total,
(SELECT sum(amount) FROM sales WHERE semester = 'S1') AS s1total,
((SELECT sum(amount) FROM sales WHERE semester = 'S1') / (SELECT sum(amount) FROM sales)) * 100 AS s1prop,
(SELECT sum(amount) FROM sales WHERE semester = 'S2') AS s2total,
((SELECT sum(amount) FROM sales WHERE semester = 'S2') / (SELECT sum(amount) FROM sales)) * 100 AS s2prop
;
This would result in
i.e. s1prop and s2prop the expected results (the other columns may be useful)
An alternative, using a CTE (Common Table Expressions) that does the same could be:-
WITH cte_total(total,s1total,s2total) AS (SELECT
(SELECT sum(amount) FROM sales),
(SELECT sum(amount) FROM sales WHERE semester = 'S1'),
(SELECT sum(amount) FROM sales WHERE semester = 'S2')
)
SELECT total, s1total, (s1total / total) * 100 AS s1prop, s2total, (s2total / total) * 100 AS s2prop FROM cte_total;
you can have multiple CTE's and gather data from other tables or even being passed as parameters. They can be extremely useful and would even allow values to be accessed throughout.
e.g.
Here's an example where a 2nd cte is added (as the first cte) that mimics passing 3 dates (instead of the hard coded values ?'s could be coded and the parameters passed via parameter binding).
As the sales table has no date for the sale a literal value has been coded, this would be normally be the column with the sale date instead of WHERE '2023-01-01' /*<<<<< would be the column that holds the date */
the hard coded date has purposefully been used so result in the BETWEEN clause resulting in true.
if the date column did exist then WHERE criteria for the semester could then be by between the respective dates for the semester.
The example:-
WITH
dates AS (SELECT
'2023-01-01' /*<<<<< ? and can then be passed as bound parameter*/ AS startdate,
'2023-03-01' /*<<<<< ? and can then be passed as bound parameter*/ AS semester2_start,
'2023-05-30' /*<<<<< ? and can then be passed as bound parameter*/as enddate
),
cte_total(total,s1total,s2total) AS (SELECT
(SELECT sum(amount) FROM sales
WHERE '2023-01-01' /*<<<<< would be the column that holds the date */
BETWEEN (SELECT startdate FROM dates)
AND (SELECT enddate FROM dates)),
(SELECT sum(amount) FROM sales WHERE semester = 'S1'),
(SELECT sum(amount) FROM sales WHERE semester = 'S2')
)
SELECT total, s1total, (s1total / total) * 100 AS s1prop, s2total, (s2total / total) * 100 AS s2prop FROM cte_total;
I'm very new to learning SQL, I apologize if my question isn't completely accurate.
The question I'm trying to answer with this query is "What is the most popular music genre in each country?" I've had to use a subquery and it works, but I found that for a few countries in the table, more than one genre has the MAX value. I'm stuck with how to edit my query so that all genres with the max value show in the results. Here is my code, using DB Browser for SQLite:
SELECT BillingCountry AS Country , name AS Genre , MAX(genre_count) AS Purchases
FROM (
SELECT i.BillingCountry, g.name, COUNT(g.genreid) AS genre_count
FROM Invoice i
JOIN InvoiceLine il
ON il.InvoiceId = i.InvoiceId
JOIN TRACK t
ON il.trackid = t.TrackId
JOIN Genre g
ON t.genreid = g.GenreId
GROUP BY 1,2
) sub
GROUP BY 1
Here is an example of the result:
| Country | Genre |Purchase|
|---------|-------|--------|
|Agrentina| Punk | 9 |
|Australia| Rock | 22 |
BUT in running just the subquery to COUNT the purchases, Argentina has two Genres with 9 Purchases (the max number for that country). How do I adjust my query to include both and not just the first one in the row?
You can do it with RANK() window function:
SELECT BillingCountry, name, genre_count
FROM (
SELECT i.BillingCountry, g.name, COUNT(*) AS genre_count,
RANK() OVER (PARTITION BY i.BillingCountry ORDER BY COUNT(*) DESC) rnk
FROM Invoice i
INNER JOIN InvoiceLine il ON il.InvoiceId = i.InvoiceId
INNER JOIN TRACK t ON il.trackid = t.TrackId
INNER JOIN Genre g ON t.genreid = g.GenreId
GROUP BY i.BillingCountry, g.name
)
WHERE rnk = 1
This will return the ties in separate rows.
If you want 1 row for each country, you could also use GROUP_CONCAT():
SELECT BillingCountry, GROUP_CONCAT(name) AS name, MAX(genre_count) AS genre_count
FROM (
SELECT i.BillingCountry, g.name, COUNT(*) AS genre_count,
RANK() OVER (PARTITION BY i.BillingCountry ORDER BY COUNT(*) DESC) rnk
FROM Invoice i
INNER JOIN InvoiceLine il ON il.InvoiceId = i.InvoiceId
INNER JOIN TRACK t ON il.trackid = t.TrackId
INNER JOIN Genre g ON t.genreid = g.GenreId
GROUP BY i.BillingCountry, g.name
)
WHERE rnk = 1
GROUP BY BillingCountry
This is based on a Khan Academy course. I have 2 SQLite tables:
CREATE TABLE table1 (id STRING PRIMARY KEY, charge_id TEXT, amount INTEGER, currency INTEGER, country STRING);
INSERT INTO table1
( id, charge_id, amount, currency, country) VALUES
('0xb01', '0x1', 2000, 'USD', 'USA'),
('0x0a1', '0x1', 500, 'USD', 'USA'),
('0x0c1', '0x1', 1000, 'CAD', 'USA'),
('0xs31', '0x4', 1000, 'YEN', 'CA');
CREATE TABLE table2 (id STRING PRIMARY KEY, charge_id TEXT, value VARIABLE);
INSERT INTO table2
( id, charge_id, value ) VALUES
('0x34s', '0x1', '123 main street'),
('0x3ze', '0x1', 'merchant-id-001'),
('0x3w2', '0x2', 'zip-code-90210' ),
('0x35k', '0x2', 'merchant-id-002');
I would SELECT the amount, currency and country from table 1 (Charges) and join with table 2 (Metadata) based on the id. Charges uses ID, while Metadata stores meta tags, with a unique identifier [id] equal to the charge [id] from Charges. I want to group the total amount, total currency for each merchant_id and only those charges that were made in the USA.
Step-by-step pseudo code:
(1) find all charges in the USA (Charges country)
(2) match all charge_ids from Charges (id) to charges in Metadata (id)
(3) separate each charge by the merchant_id (Metadata value)
(4) display the total amount, currency by merchant_id (amount, Charges currency, value)
This is a difficult because :
(1) I want to select from Charges and
(2) join to Metadata by the [id]
(3) but each Metadata record only has the charge_id and a metadata tag, which would match the merchant_id with the charge
The query result I would like is:
value (merchant id) currency total amount
merchant-id-001 usd 2500
merchant-id-001 cad 1000
merchant-id-002 yen 200
merchant-id-002 cad 50
Currently I have this query but it does not seem to be working:
select table1.amount, table1.currency, table1.country, count(*)
from table1
LEFT JOIN table1
UNION ALL
SELECT table2.value
FROM CHARGES_table2
LEFT JOIN table2
ON table1.id = table2.id
WHERE table1.country = 'USA'
GROUP BY table2.value
I am getting errors on union parameters: 2,1
Read the grammar & other documentation for the expressions you are using. The arguments to UNION are two SELECTs & it can have a final ORDER BY. Here's the parse:
select table1.amount, table1.currency, table1.country, count(*)
from table1
LEFT JOIN table1
UNION ALL
SELECT table2.value
FROM CHARGES_table2
LEFT JOIN table2
ON table1.id = table2.id
WHERE table1.country = 'USA'
GROUP BY table2.value
UNION is putting its arguments' rows into one table so it also requires that their columns agree in number & have compatible types. Here the numbers disagree.
There is no table1 in scope in the second SELECT so that is an error in isolation that is moot given the UNION.
I'm trying to replicate the GA Quantity metric (ga:itemQuantity) using standardSQL and querying the GA export to BigQuery date partitioned tables (ga_sessions_YYYYMMDD).
I have tried the following, but 'quantity' is always null:
#standardSQL
SELECT
sum(hit.item.itemQuantity) as quantity
FROM `precise-armor-133520.1500218.ga_sessions_20170801` t
CROSS JOIN
UNNEST(t.hits) AS hit
order by 1 ASC;
Other metrics work and match 100% with the GA UI so I am assuming it's not a data export problem. For example:
SELECT
sum( totals.totalTransactionRevenue ) as revenue, sum( totals.transactions ) as transactions
FROM `precise-armor-133520.1500218.ga_sessions_201708*` t
CROSS JOIN
UNNEST(t.hits) AS hit
group by `date`
order by `date` asc
These totals match Revenue and Transactions (metrics) in GA UI respectively.
What is the standardSQL query for the GA metric quantity (ga:itemQuantity)?
In order to match "Quantity" in GA's web UI by each date, use the following standard SQL:
SELECT
SUM(product.productQuantity)
,`date`
FROM
`precise-armor-133520.1500218.ga_sessions_*`
,UNNEST(hits) AS hits
,UNNEST(hits.product) AS product
WHERE hits.eCommerceAction.action_type = "6"
and _TABLE_SUFFIX between '20170801' and FORMAT_DATE("%Y%m%d", CURRENT_DATE)
group by 2
order by 2 asc
Does this work?
#standardSQL
SELECT
sku,
SUM(qtd) qtd
FROM(
SELECT
ARRAY(SELECT AS STRUCT productSKU sku, productQuantity qtd FROM UNNEST(hits), UNNEST(product) WHERE ecommerceAction.action_type = '6') data
FROM `precise-armor-133520.1500218.ga_sessions_20170801`
),
UNNEST(data)
GROUP BY sku
ORDER BY qtd DESC
LIMIT 1000
Not sure how you managed to unnest the product fields, maybe this solves your issue.
Here is my data context:
Photos (ID, Title)
Users (ID, FullName)
Ratings (PhotoID, UserID, Value, Date)
Business rules:
users can rate a photo from 1 to 5
a given user can rate a given photo only once
I want to select the top rated photos by day in the last let's say 3 days. So which photo got the best rating today, yesterday and the day before yesterday? I would like to make the number of days variable if it possible. I have to display the last N days only they rated excluding empty days.
I would like to get the photos in a single query/result because I want to bind it to a ListView to display them on a web form.
I've started this way:
DECLARE #days INT = 3
SELECT TOP (#days) ... FROM Ratings
INNER JOIN Photos ON Photos.ID = Ratings.PhotoID
GROUP BY DATEDIFF(day, [Date], CURRENT_TIMESTAMP)
ORDER BY DATEDIFF(day, [Date], CURRENT_TIMESTAMP) DESC
How can I group my groups by PhotoID, order them by SUM(Value) and select the first one from each group? Thank you very much for your help.
SELECT Date, TotalRating, Photos.*
FROM Photos
INNER JOIN
(SELECT ROW_NUMBER() OVER (ORDER BY Date DESC) AS RowNumber,
PhotoID, Date, TotalRating
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Date, ORDER BY TotalRating DESC) AS inRowNumber,
PhotoID, Date, TotalRating
FROM (SELECT PhotoID, Date, SUM(Value) AS TotalRating
FROM Photos
GROUP BY PhotoID, Date
HAVING SUM(Value) > 0 ) t)
WHERE inRowNumber = 1) t ON Photos.Id = t.PhotoID
WHERE RowNumber <= #days