I have an ending balance of $5000. I need to create a running balance, but adjust the first row to show the ending balance then sum the rest, so it will look like a bank statement. Here is what I have for the running balance but how can I adjust row 1 to not show a sum of the first row, but the ending balance instead.
with BalBefore as (
select *
from transactions
where ACCT_NAME = 'Real Solutions'
ORDER BY DATE DESC
)
select
DATE,
amount,
'$' || printf("%.2f", sum(AMOUNT) over (order by ROW_ID)) as Balance
from BalBefore;
This gives me"
DATE AMOUNT BALANCE
9/6/2019 -31.00 $-31.00 <- I need this balance to be replaced with $5000 and have the rest
9/4/2019 15.00 $-16.00 sum as normal.
9/4/2019 15.00 $-1.00
9/3/2019 -16.00 $-17.00
I have read many other questions, but I couldn't find one that I could understand so I thought I would post a simpler question.
The following is not short and sweet, but using the WITH statement and CTEs, I hope that the logic is apparent. Multiple CTEs are defined which refer to each other to make the overall query more readable. Altogether the goal was just to add a beginning balance record that could be :
/*
DROP TABLE IF EXISTS data;
CREATE temp TABLE data (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
date DATETIME NOT NULL,
amount NUMERIC NOT NULL
);
INSERT INTO data
(date, amount)
VALUES
('2019-09-03', -16.00),
('2019-09-04', 15.00),
('2019-09-04', 15.00),
('2019-09-06', -31.00)
;
*/
WITH
initial_filter AS (
SELECT id, date, amount
FROM data
--WHERE ACCT_NAME = 'Real Solutions'
),
prepared AS (
SELECT *
FROM initial_filter
UNION ALL
SELECT
9223372036854775807 as id, --largest signed integer
(SELECT MAX(date) FROM initial_filter) AS FinalDate,
-(5000.00) --ending balance (negated for summing algorithm)
),
running AS (
SELECT
id,
date,
amount,
SUM(-amount) OVER
(ORDER BY date DESC, id DESC
RANGE UNBOUNDED PRECEDING
EXCLUDE CURRENT ROW) AS balance
FROM prepared
ORDER BY date DESC, id DESC
)
SELECT *
FROM running
WHERE id != 9223372036854775807
ORDER BY date DESC, id DESC;
This produces the following
id date amount balance
4 2019-09-06 -31.00 5000
3 2019-09-04 15.00 5031
2 2019-09-04 15.00 5016
1 2019-09-03 -16.00 5001
UPDATE: The first query was not producing the correct balances. The beginning balance row and the windowing function (i.e. OVER clause) were updated to accurately sum over the correct amounts.
Note: The balance on each row is determined completely from the previous rows, not from the current row's amount, because this works backward from an ending balance, not forward from the previous row balance.
Related
It is actually possible to use # (the at sign) with sqlite to be able to use a calculated value as a constant in an other query ?
I am using a variable(a total) that i calculated previously to get an other variable (a proportion) over two time periods.
Total amout of sale
Proportion of sale between the first semester and second semester.
I copy the first query to get the constant and i had the first query to the second.
The answer is no BUT:-
This could possibly be done in a single query.
Consider this simple demo with hopefully easy to understand all-in-one queries:-
First the sales table:-
i.e. 2 columns semester and amount
10 rows in total so 1000 is the total amount
6 rows are S1 (amount is 600) so 60%
4 rows are S2 (amount is 400) so 40%
Created and populated using:-
CREATE TABLE IF NOT EXISTS sales (semester TEXT, amount REAL);
INSERT INTO sales VALUES('S1',100),('S1',100),('S1',100),('S1',100),('S1',100),('S1',100),('S2',100),('S2',100),('S2',100),('S2',100);
So you could use an all-in-one query such as:-
SELECT
(SELECT sum(amount) FROM sales) AS total,
(SELECT sum(amount) FROM sales WHERE semester = 'S1') AS s1total,
((SELECT sum(amount) FROM sales WHERE semester = 'S1') / (SELECT sum(amount) FROM sales)) * 100 AS s1prop,
(SELECT sum(amount) FROM sales WHERE semester = 'S2') AS s2total,
((SELECT sum(amount) FROM sales WHERE semester = 'S2') / (SELECT sum(amount) FROM sales)) * 100 AS s2prop
;
This would result in
i.e. s1prop and s2prop the expected results (the other columns may be useful)
An alternative, using a CTE (Common Table Expressions) that does the same could be:-
WITH cte_total(total,s1total,s2total) AS (SELECT
(SELECT sum(amount) FROM sales),
(SELECT sum(amount) FROM sales WHERE semester = 'S1'),
(SELECT sum(amount) FROM sales WHERE semester = 'S2')
)
SELECT total, s1total, (s1total / total) * 100 AS s1prop, s2total, (s2total / total) * 100 AS s2prop FROM cte_total;
you can have multiple CTE's and gather data from other tables or even being passed as parameters. They can be extremely useful and would even allow values to be accessed throughout.
e.g.
Here's an example where a 2nd cte is added (as the first cte) that mimics passing 3 dates (instead of the hard coded values ?'s could be coded and the parameters passed via parameter binding).
As the sales table has no date for the sale a literal value has been coded, this would be normally be the column with the sale date instead of WHERE '2023-01-01' /*<<<<< would be the column that holds the date */
the hard coded date has purposefully been used so result in the BETWEEN clause resulting in true.
if the date column did exist then WHERE criteria for the semester could then be by between the respective dates for the semester.
The example:-
WITH
dates AS (SELECT
'2023-01-01' /*<<<<< ? and can then be passed as bound parameter*/ AS startdate,
'2023-03-01' /*<<<<< ? and can then be passed as bound parameter*/ AS semester2_start,
'2023-05-30' /*<<<<< ? and can then be passed as bound parameter*/as enddate
),
cte_total(total,s1total,s2total) AS (SELECT
(SELECT sum(amount) FROM sales
WHERE '2023-01-01' /*<<<<< would be the column that holds the date */
BETWEEN (SELECT startdate FROM dates)
AND (SELECT enddate FROM dates)),
(SELECT sum(amount) FROM sales WHERE semester = 'S1'),
(SELECT sum(amount) FROM sales WHERE semester = 'S2')
)
SELECT total, s1total, (s1total / total) * 100 AS s1prop, s2total, (s2total / total) * 100 AS s2prop FROM cte_total;
I'm trying to create an SQLite trigger to update balance for a particular account code.
accounts table :
CREATE TABLE accounts (
year INTEGER NOT NULL,
month INTEGER NOT NULL CHECK(month BETWEEN 1 AND 12),
amount REAL NOT NULL CHECK(amount >= 0),
balance REAL,
code INTEGER NOT NULL
);
When a new row is inserted I want the balance value of the new row to reflect OLD balance + NEW amount. But this trigger does not recognize the lagging balance value and I cannot figure out why:
CREATE TRIGGER trg_accounts_balance
AFTER INSERT ON accounts
BEGIN
UPDATE accounts
SET balance = (
SELECT
lag(balance, 1, 0) OVER (
PARTITION BY code
ORDER BY month
) + NEW.amount
FROM accounts
)
WHERE rowid = NEW.ROWID;
END;
If I insert one row per month, I expect my data to look like:
year
month
amount
balance
code
2022
1
100.0
100.0
100
2022
2
9.99
109.99
100
But I get:
year
month
amount
balance
code
2022
1
100.0
100.0
100
2022
2
9.99
9.99
100
What am I doing wrong?
The query:
SELECT
lag(balance, 1, 0) OVER (
PARTITION BY code
ORDER BY month
)
FROM accounts
returns as many rows as there are in the table and SQLite picks the first (whichever it is) to return it as the result so that it can use it to add NEW.amount.
There is nothing that links this value to the specific row that was inserted.
Instead, use this:
CREATE TRIGGER trg_accounts_balance
AFTER INSERT ON accounts
BEGIN
UPDATE accounts
SET balance = COALESCE(
(
SELECT balance
FROM accounts
WHERE code = NEW.code
ORDER BY year DESC, month DESC
LIMIT 1, 1
), 0) + NEW.amount
WHERE rowid = NEW.ROWID;
END;
The subquery returns the previous inserted row by ordering the rows of the specific code descending and skipping the top row (which is the new row).
See the demo.
I'm trying to use DB Browser for SQLite to construct a nested query to determine the SECOND highest priced item purchased by the top 10 spenders. The query I have to pick out the top 10 spenders is:
SELECT user_id, max(item_total), SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders AS o
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10
This gives the user_id, most expensive item they purchased (not counting shipping or discounts) as well as the total amount they spent on the site.
I was trying to use a nested query to generate a list of the second most expensive items they purchased, but keep getting errors. I've tried
SELECT user_id, MAX(item_total) AS second_highest
FROM orders
WHERE item_total < (SELECT user_id, SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10)
group by user_id
I keep getting a row value misused error. Does anyone have pointers on this nested query or know of another way to find the second highest item purchased from within the group found in the first query?
Thanks!
(Note: The following assumes you're using Sqlite 3.25 or newer since it uses window functions).
This will return the second-largest item_total for each user_id without duplicates:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders)
SELECT user_id, item_total FROM ranked WHERE ranking = 2;
You can combine it with your original query with something like:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders),
totals AS
(SELECT user_id
, sum (item_total + shipping_cost - discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = 0
GROUP BY user_id)
SELECT t.user_id, r.item_total, t.total_spent
FROM totals AS t
JOIN ranked AS r ON t.user_id = r.user_id
WHERE r.ranking = 2
ORDER BY t.total_spent DESC, t.user_id
LIMIT 10;
Okay, after fixing your table definition to better reflect the values being stored in it and the stated problem, and fixing the data and adding to it so you can actually get results, plus an optional but useful index like so:
CREATE TABLE orders (order_id INTEGER PRIMARY KEY
, user_id INTEGER
, item_total REAL
, shipping_cost NUMERIC
, discounts_applied NUMERIC
, payment_reject INTEGER);
INSERT INTO orders(user_id, item_total, shipping_cost, discounts_applied
, payment_reject) VALUES (9852,60.69,10,0,FALSE),
(2784,123.91,15,0,FALSE), (1619,119.75,15,0,FALSE), (9725,151.92,15,0,FALSE),
(8892,153.27,15,0,FALSE), (7105,156.86,25,0,FALSE), (4345,136.09,15,0,FALSE),
(7779,134.93,15,0,FALSE), (3874,157.27,15,0,FALSE), (5102,108.3,10,0,FALSE),
(3098,59.97,10,0,FALSE), (6584,124.92,15,0,FALSE), (5136,111.06,10,0,FALSE),
(1869,113.44,20,0,FALSE), (3830,129.63,15,0,FALSE), (9852,70.69,10,0,FALSE),
(2784,134.91,15,0,FALSE), (1619,129.75,15,0,FALSE), (9725,161.92,15,0,FALSE),
(8892,163.27,15,0,FALSE), (7105,166.86,25,0,FALSE), (4345,146.09,15,0,FALSE),
(7779,144.93,15,0,FALSE), (3874,167.27,15,0,FALSE), (5102,118.3,10,0,FALSE),
(3098,69.97,10,0,FALSE), (6584,134.92,15,0,FALSE), (5136,121.06,10,0,FALSE),
(1869,123.44,20,0,FALSE), (3830,139.63,15,0,FALSE);
CREATE INDEX orders_idx_1 ON orders(user_id, item_total DESC);
the above query will give:
user_id item_total total_spent
---------- ---------- -----------
7105 156.86 373.72
3874 157.27 354.54
8892 153.27 346.54
9725 151.92 343.84
4345 136.09 312.18
7779 134.93 309.86
3830 129.63 299.26
6584 124.92 289.84
2784 123.91 288.82
1619 119.75 279.5
(If you get a syntax error from the query now, it's because you're using an old version of sqlite that doesn't support window functions.)
I have data like this:
I am trying to transform it to this (using SQLite). In the desired result, within each id, each start should be on the same row as the chronologically closest end. If an id has a start but no end (like id=4), then the corresponding end, will be empty (as shown below).
I have tried this
select
id,
max( case when start_end = "start" then date end) as start,
max(case when start_end = "end" then date end ) as end
from df
group by id
But the result is this, which is wrong because id=5 only have one row, when it should have two:
id start end
1 2 1994-05-01 1996-11-04
2 4 1979-07-18 <NA>
3 5 2010-10-01 2012-10-06
Any help is much appreciated
CREATE TABLE mytable(
id INTEGER NOT NULL PRIMARY KEY
,start_end VARCHAR(5) NOT NULL
,date DATE NOT NULL
);
INSERT INTO mytable(id,start_end,date) VALUES (2,'start','1994-05-01');
INSERT INTO mytable(id,start_end,date) VALUES (2,'end','1996-11-04');
INSERT INTO mytable(id,start_end,date) VALUES (4,'start','1979-07-18');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2005-02-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2009-09-17');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2010-10-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2012-10-06');
select
s.id as id,
s.date as 'start',
min(e.date) as 'end' -- earliest end date from "same id&start"
from
-- only start dates
(select id, date
from intable
where start_end='start'
) as s
left join -- keep the start-only lines
-- only end dates
(select id, date
from intable
where start_end='end'
) as e
on s.id = e.id
and s.date < e.date -- not too early
group by s.id, s.date -- "same id&start"
order by s.id, s.date; -- ensure sequence
Left join (to keep the start-only line for id "4") two on-the-fly tables, start dates and end dates.
Take the minimal end date which is just higher than start date (same id, using min()and group by.
Order by id, then start date.
I tested this on a test table which is similar to your dump, but has no "NOT NULL" and no "PRIMARY KEY". I guess for this test table that is irrelevant; otherwise explain the effect, please.
Note:
Internally three pairs of dates for id 5 (those that match end>start) are found, but only those are forwarded with the lowest end (min(end)) for each of the two different combinations of ID and start group by ID, start. The line where end>start but end not being the minimum is therefor not returned. That makes two lines with start/end pairs as desired.
Output (with .headers on):
id|start|end
2|1994-05-01|1996-11-04
4|1979-07-18|
5|2005-02-01|2009-09-17
5|2010-10-01|2012-10-06
UPDATE: Incorporate helpful comments by #MatBailie.
Thank you! This is exactly what I needed to do, only with a few changes:
SELECT
s.value AS 'url',
"AVGDATE" AS 'fieldname',
sum(e.value)/count(*) AS 'value'
FROM
(SELECT url, value
FROM quicktag
WHERE fieldname='NAME'
) AS s
LEFT JOIN
(SELECT url, substr(value,1,4) AS value
FROM quicktag
WHERE fieldname='DATE'
) AS e
ON s.url = e.url
WHERE e.value != ""
GROUP BY s.value;
I had a table like this:
url fieldname value
---------- ---------- ----------
1000052801 NAME Thomas
1000052801 DATE 2007
1000131579 NAME Morten
1000131579 DATE 2005
1000131929 NAME Tanja
1000131929 DATE 2014
1000158449 NAME Knud
1000158449 DATE 2007
1000158450 NAME Thomas
1000158450 DATE 2003
I needed to correlate NAME and DATE in columns based on url as a key, and generate a field with average DATE grouped by multiple NAME fields.
So my result looks like this:
url fieldname value
---------- ---------- ----------
Thomas AVGDATE 2005
Morten AVGDATE 2005
Tanja AVGDATE 2014
Knud AVGDATE 2007
Unfortunately I not have enough posts to make my vote count yet.
I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.
I'm trying to output the sum of overlapping times and the number of events that overlapped.
What is the most efficient / simple way to perform this query in MySQL?
CREATE TABLE IF NOT EXISTS `events` (
`EventID` int(10) unsigned NOT NULL auto_increment,
`StartTime` datetime NOT NULL,
`EndTime` datetime default NULL,
PRIMARY KEY (`EventID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;
INSERT INTO `events` (`EventID`, `StartTime`, `EndTime`) VALUES
(10001, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003, '2009-02-09 07:00:00', '2009-02-09 09:00:00');
# if the query was run using the data above,
# the table below would be the desired output
# Number of Overlapped Events | Total Amount of Time those events overlapped.
1, 03:00:00
2, 02:00:00
3, 02:00:00
The purpose of these results is to generate a bill for hours used. (if you have one event running, you might pay 10 dollars per hour. But if two events are running, you only have to pay 8 dollars per hour, but only for the period of time you had two events running.)
Try this:
SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
SELECT
COUNT(*) AS `Count`,
UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
FROM (
SELECT #rownum1 := #rownum1 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum1 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times1
JOIN (
SELECT #rownum2 := #rownum2 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum2 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times2
ON Times1.rownum = Times2.rownum + 1
JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
GROUP BY Times1.rownum
) Totals
GROUP BY `Count`
Result:
1, 03:00:00
2, 02:00:00
3, 02:00:00
If this doesn't do what you want, or you want some explanation, please let me know. It could be made faster by storing the repeated subquery AllTimes in a temporary table, but hopefully it runs fast enough as it is.
Start with a table that contains a single datetime field as its primary key, and populate that table with every time value you're interested in. A leap years has 527040 minutes (31622400 seconds), so this table might get big if your events span several years.
Now join against this table doing something like
SELECT i.dt as instant, count(*) as events
FROM instant i JOIN event e ON i.dt BETWEEN e.start AND e.end
GROUP BY i.dt
WHERE i.dt BETWEEN ? AND ?
Having an index on instant.dt may let you forgo an ORDER BY.
If events are added infrequently, this may be something you want to precalculate by running the query offline, populating a separate table.
I would suggest an in-memory structure that has start-time,end-time,#events... (This is simplified as time(hours), but using unix time gives up to the second accuracy)
For every event, you would insert the new event as-is if there's no overlap, otherwise, find the overlap, and split the event to (up to 3) parts that may be overlapping, With your example data, starting from the first event:
Event 1 starts at 3am and ends at 10am: Just add the event since no overlaps:
3,10,1
Event 2 starts at 5am and ends at 9am: Overlaps,so split the original, and add the new one with extra "#events"
3,5,1
5,9,2
9,10,1
Event 3 starts at 7am and ends at 9am: also overlaps, do the same with all periods:
3,5,1
5,7,2
7,9,3
9,10,1
So calculating the overlap hours per #events:
1 event= (5-3)+(10-9)=3 hours
2 events = 7-5 = 2 hours
3 events = 9-7 = 2 hours
It would make sense to run this as a background process if there are many events to compare.