How to Average the most recent X entries with GROUP BY - sqlite

I've looked at many answers on SO concerning situations related to this but I must not be understanding them too well as I didn't manage to get anything to work.
I have a table with the following columns:
timestamp (PK), type (STRING), val (INT)
I need to get the most recent 20 entries from each type and average the val column. I also need the COUNT() as there may be fewer than 20 rows for some of the types.
I can do the following if I want to get the average of ALL rows for each type:
SELECT type, COUNT(success), AVG(success)
FROM user_data
GROUP BY type
But I want to limit each group COUNT() to 20.
From here I tried the following:
SELECT type, (
SELECT AVG(success) AS ave
FROM (
SELECT success
FROM user_data AS ud2
WHERE umd2.timestamp = umd.timestamp
ORDER BY umd2.timestamp DESC
LIMIT 20
)
) AS ave
FROM user_data AS ud
GROUP BY type
But the returned average is not correct. The values it returns are as if the statement is only returning the average of a single row for each group (it doesn't change regardless of the LIMIT).

Using sqlite, you may consider the row_number function in a subquery to acquire/filter the most recent entries before determining the average and count.
SELECT
type,
AVG(val),
COUNT(1)
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY type
ORDER BY timestamp DESC
) rn
FROM
user_data
) t
WHERE rn <=20
GROUP BY type

Related

How to count number of distinct ID's where status does not match

I have a dataset that looks similar to the below:
I would like to count the total number of distinct ID's that have both a 'SEND' and 'REC'. In other words, where the status does not match (values are limited to SEND and REC for Status field). In this case, the desired query would return a value of 2 since there are 2 distinct ID's that have both a 'SEND' and 'REC' in the dataset.
I tried the following query but did not work since there could only be one status per row and this query is looking for both of those status' within one row.
SELECT COUNT(DISTINCT ID) FROM Table WHERE Date BETWEEN '2022-01-19' AND '2022-01-19' AND Status = 'SEND' AND Status = 'REC' ;
If it's only two distinct values the most efficient way is probably:
select count(*)
from
(
SELECT ID
FROM Table
WHERE Date BETWEEN '2022-01-19' AND '2022-01-19'
AND Status IN ('SEND','REC') -- only two possible values per id
GROUP BY ID
HAVING MIN(Status) <> Max(Status) -- both values exist
) as dt ;
SELECT COUNT(DISTINCT ID) FROM table WHERE ID IN (SELECT ID FROM table WHERE status = 'SEND' INTERSECT SELECT ID FROM table WHERE status = 'REC')

SQLite Running Total Without Relying on RowId sequence

So I've been looking at this for the past week and learning. I'm used to SQL Server not SQLite. I understand RowId now, and that if I have an "id" column of my own (for convenience) it will actually use RowId. I've done running totals in SQL Server using ROW_NUMBER, but that doesn't seem to be an option with SQLite. The most useful post was...
How do I calculate a running SUM on a SQLite query?
My issue is that it works as long as I have data that I will keep adding to at the "bottom" of the table. I say "bottom" and not bottom because my display of the data is always sorted based on some other column such as a month. So in other words if I insert a new record for a missing month it will get inserted with a higher "id" (aka _RowId"). My running total below that month now needs to reflect this new data for all subsequent months. This means I cannot order by "id".
With SQL Server, ROW_NUMBER took care of my sequencing because in the select where I use a.id > running.id, I would have used a.rownum > running.rownum
Here's my table
CREATE TABLE `Test` (
`id` INTEGER,
`month` INTEGER,
`year` INTEGER,
`value` INTEGER,
PRIMARY KEY(`id`)
);
Here's my query
WITH RECURSIVE running (id, month, year, value, rt) AS
(
SELECT id, month, year, value, value
FROM Test AS row1
WHERE row1.id = (SELECT a.id FROM Test AS a ORDER BY a.id LIMIT 1)
UNION ALL
SELECT rowN.id, rowN.month, rowN.year, rowN.value, (rowN.value + running.rt)
FROM Test AS rowN
INNER JOIN running ON rowN.id = (
SELECT a.id FROM Test AS a WHERE a.id > running.id ORDER BY a.id LIMIT 1
)
)
SELECT * FROM running
I can order my CTE with year,month,id similar to how it is suggested in original example I linked above. However unless I'm mistaken that example solution relies on records in the table already ordered by year, month, id. If I'm right if I insert an earlier "month", then it will break because the "id" will have the largest value of all the _RowId_s.
Appreciate if someone can set me straight.

Selecting multiple maximum values? In Sqlite?

Super new to SQLite but I thought it can't hurt to ask.
I have something like the following table (Not allowed to post images yet) pulling data from multiple tables to calculate the TotalScore:
Name TotalScore
Course1 15
Course1 12
Course2 9
Course2 10
How the heck do I SELECT only the max value for each course? I've managed use
ORDER BY TotalScore LIMIT 2
But I may end up with multiple Courses in my final product, so LIMIT 2 etc won't really help me.
Thoughts? Happy to put up the rest of my query if it helps?
You can GROUP the resultset by Name and then use the aggregate function MAX():
SELECT Name, max(TotalScore)
FROM my_table
GROUP BY Name
You will get one row for each distinct course, with the name in column 1 and the maximum TotalScore for this course in column 2.
Further hints
You can only SELECT columns that are either grouped by (Name) or wrapped in aggregate functions (max(TotalScore)). If you need another column (e.g. Description) in the resultset, you can group by more than one column:
...
GROUP BY Name, Description
To filter the resulting rows further, you need to use HAVING instead of WHERE:
SELECT Name, max(TotalScore)
FROM my_table
-- WHERE clause would be here
GROUP BY Name
HAVING max(TotalScore) > 5
WHERE filters the raw table rows, HAVING filters the resulting grouped rows.
Functions like max and sum are "aggregate functions" meaning they aggregate multiple rows together. Normally they aggregate them into one value, like max(totalscore) but you can aggregate them into multiple values with group by. group by says how to group the rows together into aggregates.
select name, max(totalscore)
from scores
group by name;
This groups all the columns together with the same name and then does a max(totalscore) for each name.
sqlite> select name, max(totalscore) from scores group by name;
Course1|15
Course2|12

aggregating and concatenating by 2 identifiers

There's a particular SQLIte command I want to run to perform a aggregate and concatenate operation.
I need to aggregate by "ID" column, then for each ID, concatenate unique 'Attribute' and also concatenate average of 'Value' for each unique corresponding 'Attribute':
I can do concatenate unqiue Attribute and aggregate by ID, but haven't got average of Value working.
Try to use subquery for getting AVG for combination of id+attribute and then use group_concat:
select t.id, Group_Concat(t.attribute) as concat_att, Group_Concat(t.avg) as concat_avg from
(
select test.id, test.attribute, AVG(test.value) as avg from test
group by test.id, test.attribute
) as t group by t.id;
See this example here: http://sqlfiddle.com/#!7/03fe4b/17

Getting median of column values in each group

I have a table containing user_id, movie_id, rating. These are all INT, and ratings range from 1-5.
I want to get the median rating and group it by user_id, but I'm having some trouble doing this.
My code at the moment is:
SELECT AVG(rating)
FROM (SELECT rating
FROM movie_data
ORDER BY rating
LIMIT 2 - (SELECT COUNT(*) FROM movie_data) % 2
OFFSET (SELECT (COUNT(*) - 1) / 2
FROM movie_data));
However, this seems to return the median value of all the ratings. How can I group this by user_id, so I can see the median rating per user?
The following gives the required median:
DROP TABLE IF EXISTS movie_data2;
CREATE TEMPORARY TABLE movie_data2 AS
SELECT user_id, rating FROM movie_data order by user_id, rating;
SELECT a.user_id, a.rating FROM (
SELECT user_id, rowid, rating
FROM movie_data2) a JOIN (
SELECT user_id, cast(((min(rowid)+max(rowid))/2) as int) as midrow FROM movie_data2 b
GROUP BY user_id
) c ON a.rowid = c.midrow
;
The logic is straightforward but the code is not beautified. Given encouragement or comments I will improve it. In a nutshell, the trick is to use rowid of SQLite.
This is not easily possible because SQLite does not allow correlated subqueries to refer to outer values in the LIMIT/OFFSET clauses.
Add WHERE clauses for the user_id to all three subqueries, and execute them for each user ID.
SELECT user_id,AVG(rating)
FROM movie_data
GROUP BY user_id
ORDER BY rating

Resources