Using Ifnull in Subquery SQLite - sqlite

I've this two tables, members and water_meter
members
id | name
=========
1 | Dani
2 | Dina
3 | Roni
water_meter
id | member_id | date | start | finish | paid | paid_at
===+============+===========+=======+===========+=======+=====================+
1 | 1 |2014-07-01 | 12.3 | 38.7 | 1 | 2014-12-29 18:28:30
2 | 2 |2014-07-01 | 57.2 | 64.3 | 0 | null
3 | 3 |2014-07-01 | 14.6 | 52.3 | 0 | null
This member need to pay their water usage every month. What I want is, the 'start' value of each month is the 'finish' value from previous months. This is my query to check water usage at August,
SELECT m.id, m.name,
ifnull(t.start, (SELECT ifnull(finish, 0) FROM members m2
LEFT JOIN water_meter t2 ON m2.id = t2.member_id AND t2.date = '2014-07-01') ) as start,
t.finish, paid
FROM members m
LEFT JOIN water_meter t ON m.id = t.member_id AND t.date = '2014-08-01'
Result :
id | name | start | finish |
===+========+========+=========+
1 | Dani | 38.7 | null |
2 | Dina | 38.7 | null |
3 | Roni | 38.7 | null |
As you can see, the "start" value is not right. What is the right query for this case?
What I want is like this
id | name | start | finish |
===+========+========+=========+
1 | Dani | 38.7 | null |
2 | Dina | 64.3 | null |
3 | Roni | 52.3 | null |
Check : http://sqlfiddle.com/#!7/29a4c/2

You haven't assigned correct where condition in inner query.
SELECT m.id, m.name,
ifnull(t.start,
(SELECT ifnull(finish, 0) FROM members m2
LEFT JOIN water_meter t2
ON m2.id = t2.member_id AND t2.date = '2014-07-01'
where m2.id = m.id)) as start,
t.finish, paid
FROM members m
LEFT JOIN water_meter t ON m.id = t.member_id AND t.date = '2014-08-01'
WHERE m.active = 1
I don't like query itself, but that produces the output you wanted.
A little better (no subqueries, which may be slow on large dataset) solution:
select
members.id,
name,
coalesce(wm_cur.start, wm_prev.finish),
wm_cur.finish
from members
left join water_meter wm_cur
on members.id = wm_cur.member_id
and wm_cur.date between '2014-08-01' and date('2014-08-01','start of month','+1 month','-1 day')
left join water_meter wm_prev
on members.id = wm_prev.member_id
and wm_prev.date between '2014-07-01' and date('2014-07-01','start of month','+1 month','-1 day')
where members.active = 1
You can replace coalesce with ifnull if you wish. It also handles entire month and not only first day, which may or may not be what you want it to be.

Related

How to use MERGE keyword in pl/sql?

I am updating a table, but I keep getting follwing error
ERROR: syntax error at or near "MERGE"
LINE 3: MERGE into
when i try to use a merge statement. I don't see anything obvious wrong with the syntax. can someone point out the obivous
MERGE into Table2 t2
using (select name, max(id) max_id from Table1 t1 group by name ) t1
on (t2.project_name=t1.name)
when matched then update set projectid=max_id where status='ongoing' ;
Table1
1 | alpha | 2021 |
2 | groundwork | 2020 |
3 | NETOS | 2021 |
5 | WebOPD | 2019 |
Table2
id | name | year | status | project name | projectID
1 | john | 2021 | ongoing | alpha | 1
2 | linda | 2021 | completed | NETOS | 3
3 | pat | 2021 | WebOPD | completed | 5
4 | tom | 2021 | ongoing | alpha | 1
version : PostgreSQL 13.6
The last line in your message says you use PostgreSQL. Tag you used (plsql) means Oracle. Which one is it, after all? I presume former, but - syntax you used is Oracle.
MERGE documentation for PostgreSQL says that
INTO can't be used
no parenthesis for ON clause
WHERE clause can't be used
See if something like this helps:
MERGE Table2 t2
using (select t1.name,
max(t1.id) max_id
from Table1 t1 join table2 t2 on t2.project_name = t1.name
where t2.status = 'ongoing'
group by name
) x
on t2.project_name = x.name
when matched then update set
t2.projectid = x.max_id ;

Sqlite / populate new column that ranks the existing rows

I've a SQLite database table with the following columns:
| day | place | visitors |
-------------------------------------
| 2021-05-01 | AAA | 20 |
| 2021-05-01 | BBB | 10 |
| 2021-05-01 | CCC | 3 |
| 2021-05-02 | AAA | 5 |
| 2021-05-02 | BBB | 7 |
| 2021-05-02 | CCC | 2 |
Now I would like to introduce a column 'rank' which indicates the rank according to the visitors each day. Expected table would look like:
| day | place | visitors | Rank |
------------------------------------------
| 2021-05-01 | AAA | 20 | 1 |
| 2021-05-01 | BBB | 10 | 2 |
| 2021-05-01 | CCC | 3 | 3 |
| 2021-05-02 | AAA | 5 | 2 |
| 2021-05-02 | BBB | 7 | 1 |
| 2021-05-02 | CCC | 2 | 3 |
Populating the data for the new column Rank can be done with a program like (Pseudocode).
for each i_day in all_days:
SELECT
ROW_NUMBER () OVER (ORDER BY `visitors` DESC) Day_Rank, place
FROM mytable
WHERE `day` = 'i_day'
for each i_place in all_places:
UPDATE mytable
SET rank= Day_Rank
WHERE `Day`='i_day'
AND place = 'i_place'
Since this line by line update is quite inefficient, I'm searching how to optimize this with a SQL sub query in combination with the UPDATE.
(does not work so far...)
for each i_day in all_days:
UPDATE mytable
SET rank= (
SELECT
ROW_NUMBER () OVER (ORDER BY `visitors` DESC) Day_Rank
FROM mytable
WHERE `day` = 'i_day'
)
Typically, this can be done with a subquery that counts the number of rows with visitors greater than the value of visitors of the current row:
UPDATE mytable
SET Day_Rank = (
SELECT COUNT(*) + 1
FROM mytable m
WHERE m.day = mytable.day AND m.visitors > mytable.visitors
);
Note that the result is actually what RANK() would return, if there are ties in the values of visitors.
See the demo.
Or, you could calculate the rankings with ROW_NUMBER() in a CTE and use it in a subquery:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY day ORDER BY visitors DESC) rn
FROM mytable
)
UPDATE mytable
SET Day_Rank = (SELECT rn FROM cte c WHERE (c.day, c.place) = (mytable.day, mytable.place));
See the demo.
Or, if your versipn of SQLite is 3.33.0+ you can use the join-like UPDATE...FROM... syntax:
UPDATE mytable AS m
SET Day_Rank = t.rn
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY day ORDER BY visitors DESC) rn
FROM mytable
) t
WHERE (t.day, t.place) = (m.day, m.place);

SQLite time duration calculation from rows

I want to calculate duration between rows with datetime data in SQLite.
Let's consider this for the base data (named intervals):
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | Off |
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
I'd like to calculate the duration for both On and Off state:
| Total On | 26mins |
| Total Off | 48mins |
Then I wrote this query:
SELECT
"Total " || interval_start.state AS state,
(SUM(strftime('%s', interval_end.date)-strftime('%s', interval_start.date)) / 60) || "mins" AS duration
FROM
intervals interval_start
INNER JOIN
intervals interval_end ON interval_end.id =
(
SELECT id FROM intervals WHERE
id > interval_start.id AND
state = CASE WHEN interval_start.state = 'On' THEN 'Off' ELSE 'On' END
ORDER BY id
LIMIT 1
)
GROUP BY
interval_start.state
However if the base data is a not in strict order:
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | On | !!!
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
My query will calculate wrong, as it will pair the only Off date with each On dates and sum them together.
Desired behavior should result something like this:
| Total On | 74mins |
| Total Off | 0mins | --this line can be omitted, or can be N/A
I have two questions:
How can I rewrite the query to handle these wrong data situations?
I feel my query is not the best in terms of performance, is it possible to improve it?
Use a CTE where you return only the starting rows of each state and then aggregate:
with cte as (
select *, lead(id) over (order by date) next_id
from (
select *, lag(state) over (order by date) prev_state
from intervals
)
where state <> coalesce(prev_state, '')
)
select c1.state,
sum(strftime('%s', c2.date) - strftime('%s', c1.date)) / 60 || 'mins' duration
from cte c1 inner join cte c2
on c2.id = c1.next_id
group by c1.state
See the demos: 1 and 2

Optimizing query that looks at a specific time window each day

This is a followup to my previous question
Optimizing query to get entire row where one field is the maximum for a group
I'll change the names from what I used there to make them a little more memorable, but these don't represent my actual use-case (so don't estimate the number of records from them).
I have a table with a schema like this:
OrderTime DATETIME(6),
Customer VARCHAR(50),
DrinkPrice DECIMAL,
Bartender VARCHAR(50),
TimeToPrepareDrink TIME(6),
...
I'd like to extract the rows from the table representing each customer's most expensive drink order during happy hour (3 PM - 6 PM) each day. So for instance I'd want results like
Date | Customer | OrderTime | MaxPrice | Bartender | ...
-------+----------+-------------+------------+-----------+-----
1/1/18 | Alice | 1/1/18 3:45 | 13.15 | Jane | ...
1/1/18 | Bob | 1/1/18 5:12 | 9.08 | Jane | ...
1/1/18 | Carol | 1/1/18 4:45 | 20.00 | Tarzan | ...
1/2/18 | Alice | 1/2/18 3:45 | 13.15 | Jane | ...
1/2/18 | Bob | 1/2/18 5:57 | 6.00 | Tarzan | ...
1/2/18 | Carol | 1/2/18 3:13 | 6.00 | Tarzan | ...
...
The table has an index on OrderTime, and contains tens of billions of records. (My customers are heavy drinkers).
Thanks to the previous question I'm able to extract this for a specific day pretty easily. I can do something like:
SELECT * FROM orders b
INNER JOIN (
SELECT Customer, MAX(DrinkPrice) as MaxPrice
FROM orders
WHERE OrderTime >= '2018-01-01 15:00'
AND OrderTime <= '2018-01-01 18:00'
GROUP BY Customer
) AS a
ON a.Customer = b.Customer
AND a.MaxPrice = b.DrinkPrice
WHERE b.OrderTime >= '2018-01-01 15:00'
AND b.OrderTime <= '2018-01-01 18:00';
This query runs in less than a second. The explain plan looks like this:
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
| id| select_type | table | type | possible_keys | key | ref | Extra |
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
| 1 | PRIMARY | b | range | OrderTime | OrderTime | NULL | Using index condition |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | b.Customer,b.Price | |
| 2 | DERIVED | orders | range | OrderTime | OrderTime | NULL | Using index condition; Using temporary; Using filesort |
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
I can also get the information about the relevant rows for my query:
SELECT Date, Customer, MAX(DrinkPrice) AS MaxPrice
FROM
orders
INNER JOIN
(SELECT '2018-01-01' AS Date
UNION
SELECT '2018-01-02' AS Date) dates
WHERE OrderTime >= TIMESTAMP(Date, '15:00:00')
AND OrderTime <= TIMESTAMP(Date, '18:00:00')
GROUP BY Date, Customer
HAVING MaxPrice > 0;
This query also runs in less than a second. Here's how its explain plan looks:
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | ref | Extra |
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | Using temporary; Using filesort |
| 1 | PRIMARY | orders | ALL | OrderTime | NULL | NULL | Range checked for each record (index map: 0x1) |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | UNION | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | |
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
The problem now is retrieving the remaining fields from the table. I tried adapting the trick from before, like so:
SELECT * FROM
orders a
INNER JOIN
(SELECT Date, Customer, MAX(DrinkPrice) AS MaxPrice
FROM
orders
INNER JOIN
(SELECT '2018-01-01' AS Date
UNION
SELECT '2018-01-02' AS Date) dates
WHERE OrderTime >= TIMESTAMP(Date, '15:00:00')
AND OrderTime <= TIMESTAMP(Date, '18:00:00')
GROUP BY Date, Customer
HAVING MaxPrice > 0) b
ON a.OrderTime >= TIMESTAMP(b.Date, '15:00:00')
AND a.OrderTime <= TIMESTAMP(b.Date, '18:00:00')
AND a.Customer = b.Customer;
However, for reasons I don't understand, the database chooses to execute this in a way that takes forever. Explain plan:
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | ref | Extra |
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
| 1 | PRIMARY | a | ALL | OrderTime | NULL | NULL | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | a.Customer | Using where |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | Using temporary; Using filesort |
| 2 | DERIVED | orders | ALL | OrderTime | NULL | NULL | Range checked for each record (index map: 0x1) |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | No tables used |
| 4 | UNION | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | |
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
Questions:
What is going on here?
How can I fix it?
To extract the rows from the table representing each customer's most expensive drink order during happy hour (3 PM - 6 PM) each day I would use row_number() over() within a case expression evaluating the hour of day, like this:
CREATE TABLE mytable(
Date DATE
,Customer VARCHAR(10)
,OrderTime DATETIME
,MaxPrice NUMERIC(12,2)
,Bartender VARCHAR(11)
);
note changes were made to OrderTime
INSERT INTO mytable(Date,Customer,OrderTime,MaxPrice,Bartender)
VALUES
('1/1/18','Alice','1/1/18 13:45',13.15,'Jane')
, ('1/1/18','Bob' ,'1/1/18 15:12', 9.08,'Jane')
, ('1/2/18','Alice','1/2/18 13:45',13.15,'Jane')
, ('1/2/18','Bob' ,'1/2/18 15:57', 6.00,'Tarzan')
, ('1/2/18','Carol','1/2/18 13:13', 6.00,'Tarzan')
;
The suggested query is this:
select
*
from (
select
*
, case when hour(OrderTime) between 15 and 18 then
row_number() over(partition by `Date`, customer
order by MaxPrice DESC)
else null
end rn
from mytable
) d
where rn = 1
;
and the result will give access to all columns you include in the derived table.
Date | Customer | OrderTime | MaxPrice | Bartender | rn
:--------- | :------- | :------------------ | -------: | :-------- | -:
0001-01-18 | Bob | 0001-01-18 15:12:00 | 9.08 | Jane | 1
0001-02-18 | Bob | 0001-02-18 15:57:00 | 6.00 | Tarzan | 1
To help display how this works, running the derived table subquery:
select
*
, case when hour(OrderTime) between 15 and 18 then
row_number() over(partition by `Date`, customer order by MaxPrice DESC)
else null
end rn
from mytable
;
produces this interim resultset:
Date | Customer | OrderTime | MaxPrice | Bartender | rn
:--------- | :------- | :------------------ | -------: | :-------- | ---:
0001-01-18 | Alice | 0001-01-18 13:45:00 | 13.15 | Jane | null
0001-01-18 | Bob | 0001-01-18 15:12:00 | 9.08 | Jane | 1
0001-02-18 | Alice | 0001-02-18 13:45:00 | 13.15 | Jane | null
0001-02-18 | Bob | 0001-02-18 15:57:00 | 6.00 | Tarzan | 1
0001-02-18 | Carol | 0001-02-18 13:13:00 | 6.00 | Tarzan | null
db<>fiddle here
The task seems to be a "groupwise-max" problem. Here's one approach, involving only 2 'queries' (the inner one is called a "derived table").
SELECT x.OrderDate, x.Customer, b.OrderTime,
x.MaxPrice, b.Bartender
FROM
(
SELECT DATE(OrderTime) AS OrderDate,
Customer,
Max(Price) AS MaxPrice
FROM tbl
WHERE TIME(OrderTime) BETWEEN '15:00' AND '18:00'
GROUP BY OrderDate, Customer
) AS x
JOIN tbl AS b
ON b.OrderDate = X.OrderDate
AND b.customer = x.Customer
AND b.Price = x.MaxPrice
WHERE TIME(b.OrderTime) BETWEEN '15:00' AND '18:00'
ORDER BY x.OrderDate, x.Customer
Desirable index:
INDEX(Customer, Price)
(There's no good reason to be using MyISAM.)
Billions of new rows per day
This adds new wrinkles. That's upwards of a terabyte of additional disk space needed each and every day?
Is it possible to summarize the data? The goal here is to add summary info as the new data comes in, and never have to re-scan the billions of old data. This may also let you remove all the secondary indexes on the Fact table.
Normalization will help shrink the table size, hence speeding up the queries. Bartender and Customer are prime candidates for such -- perhaps a SMALLINT UNSIGNED (2 bytes; 65K values) for the former and MEDIUMINT UNSIGNED (3 bytes, 16M) for the latter. That would probably shrink by 50% the 5 columns you currently show. You may get a 2x speedup on many operations after normalizing.
Normalization is best done by 'staging' the data -- Load the data into a temporary table, normalize within it, summarize it, then copy into the main Fact table.
See http://mysql.rjweb.org/doc.php/summarytables
and http://mysql.rjweb.org/doc.php/staging_table
Before getting back to the question of optimizing the one query, we need to see the schema, the data flow, whether things can be normalized, whether summary tables can be effective, etc. I would hope to have the 'answer' for the query to be mostly digested in a summary table. Sometimes this leads to a 10x speedup.

how to sum the column and get the rows based on month and year

It is my table can i get result for this .
Each user have points,i have to show number of user points got in current month,current year
Thanks.
--------------------------------
| userId | points | date |
--------------------------------
| 1 | 5 | 8/25/2013 |
| 1 | 3 | 8/16/2013 |
| 1 | 2 | 8/01/2013 |
| 1 | 2 | 9/25/2013 |
| 1 | 5 | 8/25/2013 |
| 1 | 3 | 2/16/2012 |
| 2 | NULL | NULL |
| 2 | NULL | NULL |
--------------------------------
They result should be like :
---------------------------------------------------
| userId | CurrentMonthpoints | CurrentYearPoints |
---------------------------------------------------
| 1 | 15 | 17 |
| 2 | NULL | NULL |
---------------------------------------------------
My request :
SELECT userId,
(SELECT sum(points)
from tbl_points
WHERE PointsDate between '8/1/2013' and '8/31/2013') AS CurrentMonthPoints,
(SELECT distinct SUM(points)
from tbl_points
WHERE PointsDate between '1/1/2014' and '12/31/2014' ) AS CurrentYearPoints
from tbl_user_performance_points
But My query shows wrongly as :
---------------------------------------------------
| userId | CurrentMonthpoints | CurrentYearPoints |
---------------------------------------------------
| 1 | 15 | 17 |
| 2 | 15 | 17 |
---------------------------------------------------
Advance Thanks
Give this a try:
select a.userID,
a.points_sum_m as currentmonthpoints,
b.points_sum_y as currentyearpoints
from (select userID, sum(points) as points_sum_m
from tbl_points
where month(date) = month(getdate())
group by userID) a
inner join (select userID, sum(points) as points_sum_y
from tbl_points
where year(date) = year(getdate())
group by userID) b
on a.userID = b.userID;
SQLFiddle: http://sqlfiddle.com/#!3/ff780/12
with month_cte(userid,Current_Month) as
(select userid,sum(Points)Current_Month from tbl_points
where datepart(mm,PointsDate)=datepart(mm,getdate()) and datepart(YY,PointsDate)=datepart(yy,getdate()) group by userid)
, year_cte(userid,Current_year) as
(select userid,sum(Points)Current_Year from tbl_points
where datepart(YY,PointsDate)=datepart(yy,getdate()) group by userid)
select Distinct t.Userid, Current_Month, Current_Year From tbl_points T
left join month_cte mc on T.userid=mc.userid
left join year_cte yc on t.userid=yc.userid
if you just want to correct your query replace it with this...
SELECT userId,
(SELECT sum(points)
from tbl_points
WHERE PointsDate between '8/1/2013' and '8/31/2013' and userId= X.UserId) AS CurrentMonthPoints,
(SELECT distinct SUM(points)
from tbl_points
WHERE PointsDate between '1/1/2014' and '12/31/2014' and userId = X.UserId) AS CurrentYearPoints
from tbl_user_performance_points X

Resources