How to deal with multi-modality? - sqlite

I have the following data set in the form of {id, sample#, zone}:
[(1, 4, 5), (2, 3, 5), (3, 2, 2), (4, 1, 2)]
The following code:
cur.execute("SELECT zone FROM zone_temp GROUP BY zone HAVING COUNT(*) = (SELECT MAX(cnt) FROM(SELECT COUNT(*) as cnt FROM zone_temp GROUP BY zone) tmp)")
or:
cur.execute("SELECT zone FROM zone_temp GROUP BY zone ORDER BY COUNT(zone)")
returns only the first value that occurs the most (5). In this data set 5 and 2 are equally frequent and as such there really isn't a "most frequent" value. What is the best way to deal with this?

Related

MariaDB: Match requests with work performed where total > 0 using a case

I have a request table and a work table. For request type 1, 2, 4, or 5 I need to sum the work performed of type 6 or 7 where 6 represents effectively +1 and 7 represents -1. Exclude any requests where the request's work sum is <=0 or work were done before the most recent request.
The query details are:
Find requests type in (1, 2, 4, 5) by createDate.
For each request date found, sum work type in (6, 7) as +1 or -1 until the next request createDate.
Output any requests work sum > 0 before the next request.
The sample tables:
create table request
(
Id bigint not null,
userId bigint,
type bigint not null,
creationDate timestamp not null
);
create table work
(
Id bigint not null,
type bigint not null,
creationDate timestamp not null
);
The sample data:
insert into request (Id, userId, type, creationDate)
values (4, 45, 2, '2022-12-12 11:02:17'),
(9, 64, 2, '2022-12-12 01:01:18'),
(2, 92, 2, '2022-12-11 21:36:36'),
(2, 21, 2, '2022-12-11 21:25:54'),
(1, 3, 2, '2022-12-11 21:13:58'),
(7, 243, 2, '2022-12-11 21:04:05'),
(8, 24, 2, '2022-12-11 21:01:23');
insert into work (Id, type, creationDate)
values (3, 7, '2022-12-11 00:00:00'),
(6, 7, '2022-12-11 00:00:00'),
(11, 7, '2022-12-11 00:00:00'),
(6, 7, '2022-12-11 00:00:00'),
(1, 6, '2022-12-11 00:00:00'),
(2, 6, '2022-12-11 00:00:00'),
(11, 7, '2022-12-11 00:00:00'),
(5, 7, '2022-12-11 00:00:00'),
(1, 6, '2022-12-11 00:00:00'),
(11, 7, '2022-12-12 00:00:00'),
(4, 6, '2022-12-12 00:00:00'),
(8, 7, '2022-12-12 00:00:00');
The attempted query:
select id, sum(total), type, creationDate from (
select id, 0 as total, type, creationDate from request
union
select id, case type when 6 then 1 when 7 then -1 end as total, type, creationDate from work
) a where total > 0 group by id
This takes too long on live data, but works on small sets like this fiddle.
There is a challenge in the data, the timestamp for requests includes the time, but the work only has date with no timestamp.
The fiddle reports:
id
sum(total)
type
creationDate
1
1
6
2022-12-11 00:00:00
2
1
6
2022-12-11 00:00:00
4
1
6
2022-12-13 00:00:00
However both 1 and 2 are invalidated because the timestamp of the request is technically greater than the work. The expected output should be:
id
sum(total)
type
creationDate
4
1
6
2022-12-13 00:00:00
For id = 4, the work had the date of 2022-12-13 00:00:00 and the request was timestamped 2022-12-12 11:02:17.
One way to accomplish this is to use a subquery to join the two tables together and then group the results
Here's an example:
SELECT r.Id, SUM(CASE w.type WHEN 6 THEN 1 WHEN 7 THEN -1 END) as total, r.type, r.creationDate
FROM request r
JOIN (
SELECT Id, type, creationDate
FROM work
WHERE type IN (6,7)
) w ON w.creationDate >= r.creationDate
WHERE r.type IN (1,2,4,5)
GROUP BY r.Id, r.type, r.creationDate
HAVING total > 0

sqlite, how to get the last n records whitout ORDER BY

I need to take the last 7 values of a column I was thinking of using 'LIMIT' but I don't know if it also works in reverse. Take an example, I have these values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; I need to get 4, 5, 6, 7, 8, 9, 10, considering the fact that in the future other records will be added after 10. Is there a solution with 'LIMIT' or are there other solutions? I can't use ORDER BY 'couse the values i need get are the last 7 days of values and sqlite don't have date type and I don't registered date as milliseconds but i do in date format dd-mm-yyyy so use 'ORDER BY date' don't works. Thanks
If you change the format of the dates to YYYY-MM-DD then it is as simple as this:
SELECT columnname
FROM tablename
ORDER BY date DESC
LIMIT 7
As it is now the date format the only thing that you can do is construct the proper format in the ORDER BY clause with SUBSTR():
SELECT columnname
FROM tablename
ORDER BY SUBSTR(date, 7) || SUBSTR(date, 4, 2) || SUBSTR(date, 1, 2) DESC
LIMIT 7
This expression:
SUBSTR(date, 7) || SUBSTR(date, 4, 2) || SUBSTR(date, 1, 2)
transforms a date like 05-01-2020 to 20200105 so it is comparable and can be used to sort the rows.

SQLite group-by behaviour

If a column in the SELECT clause is omitted from the GROUP BY clause, does SQLite group by the remaining columns (by default), and then return the value of the omitted column in the first row it evaluates?
For example, finding the TransactionId associated with the highest value per ProductId:
CREATE TABLE IF NOT EXISTS ProductTransaction
(
Id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
ProductId INTEGER NOT NULL,
TransactionType INTEGER NOT NULL,
Value INTEGER NOT NULL
);
INSERT INTO ProductTransaction (ProductId, TransactionType, Value)
VALUES (1, 7, 23), (1, 3, 12), (2, 4, 43), (1, 7, 5), (1, 10, 23),
(3, 3, 23), (3, 2, 31), (1, 1, 23), (2, 5, 50), (2, 6, 14), (1, 4, 23);
SELECT ProductId
, TransactionType
, MAX(Value)
FROM ProductTransaction
GROUP BY ProductId;
DELETE FROM ProductTransaction;
Running the previous statements gives me the TransactionType of 7 for ProductId 1 (Highest value 23).
However, if I add an the index:
CREATE INDEX IF NOT EXISTS IDX_TransType ON ProductTransaction(ProductId ASC, TransactionType ASC);
It returns the TransactionType 1, presumably because it's now ordering the rows according to the index. Modifying the index supports this theory:
CREATE INDEX IF NOT EXISTS IDX_TransType ON ProductTransaction(ProductId ASC, TransactionType DESC);
It will now return TransactionType 10 for ProductId 1.
Is this behaviour by design, or is it just an unreliable side-effect?
EDIT: It seems that it's an unreliable side-effect. From the documentation:
Each expression in the result-set is then evaluated once for each
group of rows. If the expression is an aggregate expression, it is
evaluated across all rows in the group. Otherwise, it is evaluated
against a single arbitrarily chosen row from within the group. If
there is more than one non-aggregate expression in the result-set,
then all such expressions are evaluated for the same row.
https://www.sqlite.org/lang_select.html#resultset
Since SQLite 3.7.11, using MAX() or MIN() will force any non-aggregated columns to come from the same row that matches the MAX()/MIN().
However, when there are multiple rows with the same largest/smalles value, it is still unspecified from which of those rows the other columns' values come. (SQLite's behaviour is consistent in this regard, but can change in different versions or with different database schemas.)

Get days until future event in SQLite

Trying to get several things from a SQLite table with names and dates of birth and am having trouble getting the # of days until a person's next birthday. Dates are stored as SQLite's TEXT data type in format '%Y-%m-%d 00:00:00'.
I can get age:
SELECT ((strftime('%s', 'now') - strftime('%s', dob)) / 31536000) AS age
I like this solution for showing the closest birthdays first:
ORDER BY SUBSTR(date('now'), 6) > SUBSTR(dob, 6), SUBSTR(dob, 6) ASC
But I'm breaking my brain over getting the days until the next birthday. My latest attempt is taking the julianday of the substring of the day and month from dob and concatenate it with the current year to compare against julianday() and put in conditionals to take the year change into account, but I haven't worked that out yet and I'm hoping someone has a more elegant solution.
Have made my hideous solution work, so here it is:
SELECT
CASE WHEN
julianday((SUBSTR(date('now'), 1, 5) || SUBSTR(dob, 6, 5))) > julianday('now')
THEN CAST(ROUND(
julianday((SUBSTR(date('now'), 1, 5) || SUBSTR(dob, 6, 5))) - julianday('now'), 0) AS INTEGER)
ELSE CAST(ROUND((
julianday(SUBSTR(date('now'), 1, 5) || '12-31') - julianday('now')) + (
julianday(SUBSTR(date('now'), 1, 5) || SUBSTR(dob, 6, 5)) - julianday(SUBSTR(date('now'), 1, 5) || '01-01')), 0) AS INTEGER)
END
AS dub FROM person;
Will only have to put in another conditional to improve the rounding.

Counting the same column of different value sets in a single group by clause

I have a table (SQLite DB) like this,
CREATE TABLE parser (ip text, user text, code text);
Now I need to count how many code have a value of either 1, 2, or 3, and how many are not, group by ip field.
But as far as I can go, I can't do this altogether, but with two SQL phrases.
e.g
select count(*) as cnt, ip
from parser
where code in (1, 2, 3)
group by ip
order by cnt DESC
limit 10
And a not in query.
So, can I merge the two queries into a single one?
This will you give you two counts per ip, one for the rows where code has values 1, 2 or 3 and another count for all the rest (everything but 1, 2, 3, including NULL.)
SELECT ip,
COUNT(CASE WHEN code IN (1, 2, 3) THEN 1 ELSE NULL END) AS cnt_in,
COUNT(CASE WHEN code IN (1, 2, 3) THEN NULL ELSE 1 END) AS cnt_rest
FROM parser
GROUP BY ip
ORDER BY cnt_in DESC ;
This will you give you 3 counts, one for 1,2,3, another for the rest of integer values and a third for rows that have NULL in code:
SELECT ip,
COUNT(CASE WHEN code IN (1, 2, 3) THEN 1 END) AS cnt_in,
COUNT(CASE WHEN code NOT IN (1, 2, 3) THEN 1 END) AS cnt_not_in,
COUNT(CASE WHEN code IS NULL THEN 1 END) AS cnt_null
FROM parser
GROUP BY ip
ORDER BY cnt_in DESC ;
If you want to limit the first result (as your code) to the top 10 rows and the second result to the other top 10 rows, you can use two subqueries and a UNION:
( SELECT ip,
COUNT(*) AS cnt,
'in' AS type
FROM parser
WHERE code IN (1, 2, 3)
GROUP BY ip
ORDER BY cnt DESC
LIMIT 10
)
UNION ALL
( SELECT ip,
COUNT(*) AS cnt,
'not in' AS type
FROM parser
WHERE code NOT IN (1, 2, 3)
GROUP BY ip
ORDER BY cnt DESC
LIMIT 10
) ;
Tested at SQL-Fiddle

Resources