Sqlite / populate new column that ranks the existing rows - sqlite

I've a SQLite database table with the following columns:
| day | place | visitors |
-------------------------------------
| 2021-05-01 | AAA | 20 |
| 2021-05-01 | BBB | 10 |
| 2021-05-01 | CCC | 3 |
| 2021-05-02 | AAA | 5 |
| 2021-05-02 | BBB | 7 |
| 2021-05-02 | CCC | 2 |
Now I would like to introduce a column 'rank' which indicates the rank according to the visitors each day. Expected table would look like:
| day | place | visitors | Rank |
------------------------------------------
| 2021-05-01 | AAA | 20 | 1 |
| 2021-05-01 | BBB | 10 | 2 |
| 2021-05-01 | CCC | 3 | 3 |
| 2021-05-02 | AAA | 5 | 2 |
| 2021-05-02 | BBB | 7 | 1 |
| 2021-05-02 | CCC | 2 | 3 |
Populating the data for the new column Rank can be done with a program like (Pseudocode).
for each i_day in all_days:
SELECT
ROW_NUMBER () OVER (ORDER BY `visitors` DESC) Day_Rank, place
FROM mytable
WHERE `day` = 'i_day'
for each i_place in all_places:
UPDATE mytable
SET rank= Day_Rank
WHERE `Day`='i_day'
AND place = 'i_place'
Since this line by line update is quite inefficient, I'm searching how to optimize this with a SQL sub query in combination with the UPDATE.
(does not work so far...)
for each i_day in all_days:
UPDATE mytable
SET rank= (
SELECT
ROW_NUMBER () OVER (ORDER BY `visitors` DESC) Day_Rank
FROM mytable
WHERE `day` = 'i_day'
)

Typically, this can be done with a subquery that counts the number of rows with visitors greater than the value of visitors of the current row:
UPDATE mytable
SET Day_Rank = (
SELECT COUNT(*) + 1
FROM mytable m
WHERE m.day = mytable.day AND m.visitors > mytable.visitors
);
Note that the result is actually what RANK() would return, if there are ties in the values of visitors.
See the demo.
Or, you could calculate the rankings with ROW_NUMBER() in a CTE and use it in a subquery:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY day ORDER BY visitors DESC) rn
FROM mytable
)
UPDATE mytable
SET Day_Rank = (SELECT rn FROM cte c WHERE (c.day, c.place) = (mytable.day, mytable.place));
See the demo.
Or, if your versipn of SQLite is 3.33.0+ you can use the join-like UPDATE...FROM... syntax:
UPDATE mytable AS m
SET Day_Rank = t.rn
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY day ORDER BY visitors DESC) rn
FROM mytable
) t
WHERE (t.day, t.place) = (m.day, m.place);

Related

SQLite time duration calculation from rows

I want to calculate duration between rows with datetime data in SQLite.
Let's consider this for the base data (named intervals):
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | Off |
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
I'd like to calculate the duration for both On and Off state:
| Total On | 26mins |
| Total Off | 48mins |
Then I wrote this query:
SELECT
"Total " || interval_start.state AS state,
(SUM(strftime('%s', interval_end.date)-strftime('%s', interval_start.date)) / 60) || "mins" AS duration
FROM
intervals interval_start
INNER JOIN
intervals interval_end ON interval_end.id =
(
SELECT id FROM intervals WHERE
id > interval_start.id AND
state = CASE WHEN interval_start.state = 'On' THEN 'Off' ELSE 'On' END
ORDER BY id
LIMIT 1
)
GROUP BY
interval_start.state
However if the base data is a not in strict order:
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | On | !!!
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
My query will calculate wrong, as it will pair the only Off date with each On dates and sum them together.
Desired behavior should result something like this:
| Total On | 74mins |
| Total Off | 0mins | --this line can be omitted, or can be N/A
I have two questions:
How can I rewrite the query to handle these wrong data situations?
I feel my query is not the best in terms of performance, is it possible to improve it?
Use a CTE where you return only the starting rows of each state and then aggregate:
with cte as (
select *, lead(id) over (order by date) next_id
from (
select *, lag(state) over (order by date) prev_state
from intervals
)
where state <> coalesce(prev_state, '')
)
select c1.state,
sum(strftime('%s', c2.date) - strftime('%s', c1.date)) / 60 || 'mins' duration
from cte c1 inner join cte c2
on c2.id = c1.next_id
group by c1.state
See the demos: 1 and 2

SQLITE get row continous value

I have a simple table. One of the example rows looks like this:
id | name |
1 | a |
2 | a |
3 | a |
4 | b |
6 | b |
7 | a |
8 | a |
I want to get last continuous id.
so if i start at '1', the result should be '4'
in this example, the result should be '7'
3 |a |
4 |b |
5 |a |
6 |a |
7 |a |
10|a |
Only i now is that select all after my input number, and find continuous programmatically.
how can i do..?
If I understand the question correctly, this should work for you:
select id from tableName t1 where not exists(select id from tableName t2 where t2.id=t1.id+1) and (id-(select count(*) from tableName t3 where t3.id<t1.id))=(select min(id) from tableName);
If you want to start from 10, it should be:
select id from tableName t1 where not exists(select id from tableName t2 where t2.id=t1.id+1) and (id-(select count(*) from tableName t3 where t3.id<t1.id and t3.id>=10))=10;

distinct sum does not distinct values

I have 2 tables, reservations and articles:
Reservations
------------------------------
Id | Name | City |
------------------------------
1 | Mike | Stockholm
2 | Daniel | Gothenburg
2 | Daniel | Gothenburg
3 | Andre | Gothenburg (Majorna)
Articles
-------------------------------------------------------------
ArticleId | Name | Amount | ReservationId |
-------------------------------------------------------------
10 | Coconuts | 1 | 1
10 | Coconuts | 4 | 2
11 | Apples | 2 | 2
12 | Oranges | 2 | 3
I want to select Articles Name and the sum of Articles.Amount per Articles.ArticleId and Reservations.City.
My code:
SELECT distinct r.ID,a.Name as ArticleName,
sum(a.Amount) as ArticlesAmount,
substr(r.City,1,3) as ToCityName
FROM Reservations r
INNER JOIN Articles a
on r.Id = a.ReservationId
WHERE a.Name <> ''
GROUP BY ToCityName,a.ArticleId,a.Name
ORDER BY ToCityName ASC
This gives me following result:
Id | ArticleName | ArticlesAmount | ToCityName
2 | Coconuts | 8 | Got
2 | Apples | 4 | Got
3 | Oranges | 2 | Got
1 | Coconuts | 1 | Sto
But i want:
Id | ArticleName | ArticlesAmount | ToCityName
2 | Coconuts | 4 | Got
2 | Apples | 2 | Got
3 | Oranges | 2 | Got
1 | Coconuts | 1 | Sto
Help would be appreciated, and an explanation please :)
Fiddle
Have a look at SQLFiddle
Code:
SELECT distinct r.ID,a.Name as ArticleName,
sum(distinct a.Amount) as ArticlesAmount,
substr(r.City,1,3) as ToCityName
FROM Reservations r
INNER JOIN Articles a
on r.Id = a.ReservationId
WHERE a.Name <> ''
GROUP BY ToCityName,a.ArticleId,a.Name
ORDER BY ToCityName ASC
You want to ensure you sum the amount by the distinct number of times it appears per group.
I had added Articles again to select requested rows again... here is query
SELECT DISTINCT
r.ID,
a.`Name` AS ArticleName,
Articles.Amount,
substr(r.City, 1, 3) AS ToCityName
FROM
Reservations r
INNER JOIN Articles a ON r.Id = a.ReservationId
INNER JOIN Articles ON a.ReservationId = Articles.ReservationId
AND a.ArticleId = Articles.ArticleId
WHERE
a. NAME <> ''
GROUP BY
ToCityName,
a.ArticleId,
a. NAME
ORDER BY
ToCityName ASC

Using Ifnull in Subquery SQLite

I've this two tables, members and water_meter
members
id | name
=========
1 | Dani
2 | Dina
3 | Roni
water_meter
id | member_id | date | start | finish | paid | paid_at
===+============+===========+=======+===========+=======+=====================+
1 | 1 |2014-07-01 | 12.3 | 38.7 | 1 | 2014-12-29 18:28:30
2 | 2 |2014-07-01 | 57.2 | 64.3 | 0 | null
3 | 3 |2014-07-01 | 14.6 | 52.3 | 0 | null
This member need to pay their water usage every month. What I want is, the 'start' value of each month is the 'finish' value from previous months. This is my query to check water usage at August,
SELECT m.id, m.name,
ifnull(t.start, (SELECT ifnull(finish, 0) FROM members m2
LEFT JOIN water_meter t2 ON m2.id = t2.member_id AND t2.date = '2014-07-01') ) as start,
t.finish, paid
FROM members m
LEFT JOIN water_meter t ON m.id = t.member_id AND t.date = '2014-08-01'
Result :
id | name | start | finish |
===+========+========+=========+
1 | Dani | 38.7 | null |
2 | Dina | 38.7 | null |
3 | Roni | 38.7 | null |
As you can see, the "start" value is not right. What is the right query for this case?
What I want is like this
id | name | start | finish |
===+========+========+=========+
1 | Dani | 38.7 | null |
2 | Dina | 64.3 | null |
3 | Roni | 52.3 | null |
Check : http://sqlfiddle.com/#!7/29a4c/2
You haven't assigned correct where condition in inner query.
SELECT m.id, m.name,
ifnull(t.start,
(SELECT ifnull(finish, 0) FROM members m2
LEFT JOIN water_meter t2
ON m2.id = t2.member_id AND t2.date = '2014-07-01'
where m2.id = m.id)) as start,
t.finish, paid
FROM members m
LEFT JOIN water_meter t ON m.id = t.member_id AND t.date = '2014-08-01'
WHERE m.active = 1
I don't like query itself, but that produces the output you wanted.
A little better (no subqueries, which may be slow on large dataset) solution:
select
members.id,
name,
coalesce(wm_cur.start, wm_prev.finish),
wm_cur.finish
from members
left join water_meter wm_cur
on members.id = wm_cur.member_id
and wm_cur.date between '2014-08-01' and date('2014-08-01','start of month','+1 month','-1 day')
left join water_meter wm_prev
on members.id = wm_prev.member_id
and wm_prev.date between '2014-07-01' and date('2014-07-01','start of month','+1 month','-1 day')
where members.active = 1
You can replace coalesce with ifnull if you wish. It also handles entire month and not only first day, which may or may not be what you want it to be.

Query returning incorrect count

I have the following query.
SELECT a.link_field1 AS journo, count(a.link_id) as articles, AVG( b.vote_value ) AS score FROM dan_links a LEFT JOIN dan_votes b ON link_id = vote_link_id WHERE link_field1 <> '' and link_status NOT IN ('discard', 'spam', 'page') GROUP BY link_field1 ORDER BY link_field1, link_id
This query is returning a count of 3 for the first item in the list. What should be returned is
Journo | count | score
John S | 2 | 6.00
Joe B | 1 | 4
However for the first one John S, it returns a count of 3.
If I directly query
select * from dan_links where link_field1 = 'John S'
I get 2 records return as I would expect. I can't for the life of me figure out why the count is wrong, unless for some reason it is counting the records from the dan_vote table
How can I get the correct count, or is my query completely wrong?
EDIT: Contents of the tables
dan_links
link_id | link_field1 | link | source | link_status
1 | John S | http://test.com | test.com | approved
2 | John S | http://google.com | google | approved
3 | Joe B | http://facebook.com | facebook | approved
dan_votes
vote_id | link_id | vote_value
1 | 1 | 5
2 | 1 | 8
3 | 2 | 4
4 | 3 | 1
EDIT: it looks like it is counting the rows in the votes table for some reason
When you are doing a left outer join with the condition link_id = vote_link_id for every matching record one row is created, some thing like
link_id | link_field1 | link | source | link_status|vote_id|vote_value
1 | John S | http://test.com | test.com | approved|1|5
1 | John S | http://test.com | test.com | approved|2|8
2 | John S | http://google.com | google | approved|3|4
3 | Joe B | http://facebook.com | facebook | approved|4|1
Now when you do group by on link_field1, you get count as 3 for John S
Nested query might work
SELECT journo,count(linkid) as articles,AVG(score) FROM
(SELECT a.link_field1 AS journo, AVG( b.vote_value ) AS score, a.link_id as linkid
FROM dan_links a
LEFT JOIN dan_votes b
ON link_id = vote_link_id
WHERE link_field1 <> ''
and link_status NOT IN ('discard', 'spam', 'page')
GROUP BY link_id
ORDER BY link_field1, link_id) GROUP BY journo
The above query will give incorrect average as ((n1+n2)/2+n3)/2 != (n1+n2+n3)/3, so use the below query
SELECT journo,count(linkid) as articles, SUM(vote_sum)/SUM(count(linkid))
FROM
(SELECT a.link_field1 AS journo, SUM( b.vote_value ) AS vote_sum, a.link_id as linkid, count(a.link_id) as count_on_id
FROM dan_links a
LEFT JOIN dan_votes b
ON link_id = vote_link_id
WHERE link_field1 <> ''
and link_status NOT IN ('discard', 'spam', 'page')
GROUP BY link_id
ORDER BY link_field1, link_id) GROUP BY journo
Hope this helps.

Resources