How to count number of distinct ID's where status does not match - count

I have a dataset that looks similar to the below:
I would like to count the total number of distinct ID's that have both a 'SEND' and 'REC'. In other words, where the status does not match (values are limited to SEND and REC for Status field). In this case, the desired query would return a value of 2 since there are 2 distinct ID's that have both a 'SEND' and 'REC' in the dataset.
I tried the following query but did not work since there could only be one status per row and this query is looking for both of those status' within one row.
SELECT COUNT(DISTINCT ID) FROM Table WHERE Date BETWEEN '2022-01-19' AND '2022-01-19' AND Status = 'SEND' AND Status = 'REC' ;

If it's only two distinct values the most efficient way is probably:
select count(*)
from
(
SELECT ID
FROM Table
WHERE Date BETWEEN '2022-01-19' AND '2022-01-19'
AND Status IN ('SEND','REC') -- only two possible values per id
GROUP BY ID
HAVING MIN(Status) <> Max(Status) -- both values exist
) as dt ;

SELECT COUNT(DISTINCT ID) FROM table WHERE ID IN (SELECT ID FROM table WHERE status = 'SEND' INTERSECT SELECT ID FROM table WHERE status = 'REC')

Related

How to Average the most recent X entries with GROUP BY

I've looked at many answers on SO concerning situations related to this but I must not be understanding them too well as I didn't manage to get anything to work.
I have a table with the following columns:
timestamp (PK), type (STRING), val (INT)
I need to get the most recent 20 entries from each type and average the val column. I also need the COUNT() as there may be fewer than 20 rows for some of the types.
I can do the following if I want to get the average of ALL rows for each type:
SELECT type, COUNT(success), AVG(success)
FROM user_data
GROUP BY type
But I want to limit each group COUNT() to 20.
From here I tried the following:
SELECT type, (
SELECT AVG(success) AS ave
FROM (
SELECT success
FROM user_data AS ud2
WHERE umd2.timestamp = umd.timestamp
ORDER BY umd2.timestamp DESC
LIMIT 20
)
) AS ave
FROM user_data AS ud
GROUP BY type
But the returned average is not correct. The values it returns are as if the statement is only returning the average of a single row for each group (it doesn't change regardless of the LIMIT).
Using sqlite, you may consider the row_number function in a subquery to acquire/filter the most recent entries before determining the average and count.
SELECT
type,
AVG(val),
COUNT(1)
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY type
ORDER BY timestamp DESC
) rn
FROM
user_data
) t
WHERE rn <=20
GROUP BY type

Find offset in table with order and where clause

Consider the following schema and table:
CREATE TABLE IF NOT EXISTS `names` (
`id` INTEGER,
`name` TEXT,
PRIMARY KEY(`id`)
);
INSERT INTO `names` VALUES (1,'zulu');
INSERT INTO `names` VALUES (2,'bene');
INSERT INTO `names` VALUES (3,'flip');
INSERT INTO `names` VALUES (4,'rossB');
INSERT INTO `names` VALUES (5,'albert');
INSERT INTO `names` VALUES (6,'zuse');
INSERT INTO `names` VALUES (7,'rossA');
INSERT INTO `names` VALUES (8,'juss');
I access this table with the following query:
SELECT *
FROM names
ORDER BY name
LIMIT 10
OFFSET 4;
Where offset 4 is used because it's the rowid (in the ordered list) to the first occurance of 'R%' names. This returns:
1="7" "rossA"
2="4" "rossB"
3="1" "zulu"
4="6" "zuse"
My question is, is there an SQL statement which can return the OFFSET value (in the R case above its 4) given a starting first letter please? (I don't really want to resort to stepping() through results, counting rows, until first 'R%' is reached!)
I've tried the following without success:
SELECT MIN(ROWID)
FROM
(
SELECT *
FROM names
ORDER BY name
)
WHERE name LIKE 'R%'
It always returns single row of NULL data.
As background, this table is a phone book list and I want to provide subset of results (from main table) back to caller, starting at a initial letter offset.
Just count the rows before the string of interest:
select count(*) from names where name < 'r';
The following has a number of options. Basically your issues is that the sub-query doesn't return the roiwd hencne NULL as the minimum. However, there is no need to use the rowid directly as the id column is an alias of the rowid, so that could be used:-
SELECT name, id, MIN(rowid), min(id) -- shows how rowid and id are the same
FROM
(
SELECT rowid, * -- returns rowid from the subquery so min(rowid) now works
FROM names
ORDER BY name
)
WHERE name LIKE 'R%' ORDER BY id ASC LIMIT 1 -- Will effectivley do the same (no need for the sub-query)
Extra columns added for demonstration.
As such your query could be :-
SELECT min(rowid) FROM names where name LIKE 'R%';
Or :-
SELECT min(id) FROM names where name LIKE 'R%';
You could also use :-
SELECT id FROM names WHERE name LIKE 'R%' ORDER BY id ASC LIMIT 1;
Or :-
SELECT rowid FROM names WHERE name LIKE 'R%' ORDER BY id ASC LIMIT 1;

Getting median of column values in each group

I have a table containing user_id, movie_id, rating. These are all INT, and ratings range from 1-5.
I want to get the median rating and group it by user_id, but I'm having some trouble doing this.
My code at the moment is:
SELECT AVG(rating)
FROM (SELECT rating
FROM movie_data
ORDER BY rating
LIMIT 2 - (SELECT COUNT(*) FROM movie_data) % 2
OFFSET (SELECT (COUNT(*) - 1) / 2
FROM movie_data));
However, this seems to return the median value of all the ratings. How can I group this by user_id, so I can see the median rating per user?
The following gives the required median:
DROP TABLE IF EXISTS movie_data2;
CREATE TEMPORARY TABLE movie_data2 AS
SELECT user_id, rating FROM movie_data order by user_id, rating;
SELECT a.user_id, a.rating FROM (
SELECT user_id, rowid, rating
FROM movie_data2) a JOIN (
SELECT user_id, cast(((min(rowid)+max(rowid))/2) as int) as midrow FROM movie_data2 b
GROUP BY user_id
) c ON a.rowid = c.midrow
;
The logic is straightforward but the code is not beautified. Given encouragement or comments I will improve it. In a nutshell, the trick is to use rowid of SQLite.
This is not easily possible because SQLite does not allow correlated subqueries to refer to outer values in the LIMIT/OFFSET clauses.
Add WHERE clauses for the user_id to all three subqueries, and execute them for each user ID.
SELECT user_id,AVG(rating)
FROM movie_data
GROUP BY user_id
ORDER BY rating

Subtract record from previous record within same column in teradata

Suppose there is column as shown below.
How to subtract record from previous record. for example the first record from second record and so on and then store in third column. As these are date values I want results in this format:
1d:1h:46m
Thanks
You can use PRECEDING:
select
<key>,
(<your column> - PreviousValue) DAY(4) TO MINUTE
from
(
select
<some sort of key>,
<your column>,
min(<your column>) over (partition by <whatever your grain is> ORDER BY <whatever> ROWS BETWEEN 1 PRECEDING and 1 PRECEDING) as PreviousValue
from <your table>
...
) t1
If you don't need to partition by anything, just leave out the partition by

How to select data where sum is greater than x

I am very new to SQL and am using SQLite 3 to run basket analysis on sales data.
The relevant columns are the product ID, a unique transaction ID (which identifies the basket) and the product quantity. Where a customer has bought more than one product type, the unqiue transaction ID is repeated.
I am wanting to select only baskets where the customer has bought more than 1 item.
Is there any way on SQLite to select the unique transaction ID and the sum of the quantity, but only for unique transaction IDs where the quantity is more than one?
So far I have tried:
select uniqID, sum(qty) from salesdata where sum(qty) > 1 group by uniqID;
But SQLite gives me the error 'misuse of aggregate: sum()'
Sorry if this is a simple question but I am struggling to find any relevant information by googling!
Try
select uniqID, sum(qty) from salesdata group by uniqID having sum(qty) > 1
"where" cannot be used on aggregate functions - you can only use where on uniqId, in this case.
if you want to put any condition on the result you get with group by you must use having.
select uniqID, sum(qty) as sumqty from salesdata group by uniqID having sumqty > 1
you can put any of the condition with having normaly as in where.
having sumqty = 1 ,having sumqty < 1 ,having sumqty IN (1,2,3) etc..

Resources