SQLite Efficient Running Total - sqlite

I have a table of transactions in SQLite
number date Category Amount runningBalance
I want the running balance column to have a running sum of the amount column after the table is sorted by Date first and number second.
I can do this with a select when reading. But this table has the potential to get very large and I don't want to recalculate every time. I want to make a trigger where all the transactions following (by date then number) the inserted/edited transaction have their runningBalance value updated.
This will mean that the calculations are reduced... as more recent transactions are likely to be edited more often, and older ones rarely. It also will spread the computation over writes so that reads are near instant.
Can anyone provide assistance on how to set up such a trigger?
so far this is what I have but it does not give desired results. And recalculates all every time. Not just the ones following the change.
CREATE TRIGGER RunningTotal AFTER UPDATE ON Transactions FOR EACH ROW
BEGIN
UPDATE Transactions
SET RunningBalance = (
SELECT (
SELECT sum(Amount)
FROM TopInfo t2
WHERE t2.Date <= t1.Date
)
FROM Transactions t1
);
END;
Thanks!

I've managed to find a way that works. Not sure how efficient it is though. Love to hear if anyone knows a more efficient way to update the Balance column.
CREATE TRIGGER Balance AFTER UPDATE OF Amount ON Transactions FOR EACH ROW
BEGIN
UPDATE Transactions
SET Balance = (
SELECT Balance
FROM (
SELECT TransactionID,
(
SELECT sum(t2.Amount)
FROM Transactions t2
WHERE t2.Date <= t1.Date
ORDER BY Date
)
AS Balance
FROM Transactions t1
WHERE TransactionID = Transactions.TransactionID
ORDER BY Date
)
)
WHERE Transactions.Date >= NEW.Date;
END;
UPDATE:
CREATE TRIGGER Balance AFTER UPDATE OF Amount ON Transactions FOR EACH ROW
BEGIN
UPDATE Transactions
SET Balance = (
SELECT Balance
FROM (
SELECT TransactionID,
(
SELECT sum(t2.Amount)
FROM Transactions t2
WHERE CASE WHEN t2.Date = t1.Date THEN t2.TransactionID <= t1.TransactionID ELSE t2.Date <= t1.Date END
ORDER BY Date,
TransactionID
)
AS Balance
FROM Transactions t1
WHERE TransactionID = Transactions.TransactionID
ORDER BY Date,
TransactionID
)
)
WHERE Transactions.Date >= NEW.Date;
END;
I've Done Some more with running total and have come up with 2 ways. The second is much slower than the first. Any ideas why???
method 1
SELECT TransactionID,Date, Account, Amount,
(SELECT sum(t2.Amount)
FROM Transactions t2
WHERE
CASE WHEN t2.Date = t1.Date
THEN t2.TransactionID <= t1.TransactionID
AND t2.Account == t1.Account
ELSE t2.Date <= t1.Date
AND t2.Account == t1.Account
END
ORDER BY Date, TransactionID)
AS Balance
FROM Transactions t1
ORDER BY Date, TransactionID
Method2
SELECT n.TransactionID, n.Date, n.Account, n.Amount,
SUM(o.Amount) As running_total
FROM Transactions n LEFT JOIN Transactions o
ON (
CASE WHEN o.Date = n.Date
THEN n.TransactionID >= o.TransactionID
AND o.Account == n.Account
ELSE n.Date >= o.Date
AND o.Account == n.Account
END
)
GROUP BY n.Account, n.Date, n.TransactionID
ORDER BY n.Date, n.TransactionID;

Related

Spool space error when inserting large result set to table

I have a SQL query in teradata that returns a results set of ~160m rows in (I guess) a reasonable time: dependent on how good a day the server is having it runs between 10-60 minutes.
I recently got access to space to save it as a table, however using my initial query and the "insert into " command I get error 2646-no more spool.
query structure is
insert into <test_DB.tablename>
with smaller_dataset as
(
select
*
from
(
select
items
,case items
from
<Database.table>
QUALIFY ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY C desc , LAST_UPDATE_DTM DESC) = 1
where 1=1
and other things
) T --irrelevant alias for subquery
QUALIFY ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY C desc) = 1)
, employee_table as
(
select
items
,max(J1.field1) J1_field1
,max(J2.field1) J2_field1
,max(J3.field1) J3_field1
,max(J4.field1) J4_field1
from smaller_dataset S
self joins J1,J2,J3,J4
group by
non-aggregate items
)
select
items
case items
from employee_table
;
How can I break up the return into smaller chunks to prevent this error?

Query fails to execute after converting a column from Varchar2 to CLOB

I have a oracle query
select id from (
select ID, ROW_NUMBER() over (partition by LATEST_RECEIPT order by ID) rownumber
from Table
where LATEST_RECEIPT in
(
select LATEST_RECEIPT from Table
group by LATEST_RECEIPT
having COUNT(1) > 1
)
) t
where rownumber <> 1;
The data type of LATEST_RECEIPT was earlier varchar2(4000) and this query worked fine. Since the length of the column needs to be extended i modified it to CLOB, after which this fails. Could anyone help me fix this issue or provide a work around?
You can change your inner query to look for other rows with the same last_receipt value but a different ID (assuming ID is unique); if another row exists then that is equivalent to your count returning greater than one. But you can't simply test two CLOB values for equality, you need to use dbms_lob.compare:
select ID
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID != t1.ID
-- or if ID isn't unique: and t2.ROWID != t1.ROWID
);
Applying the row number filter is tricker, as you also can't use a CLOB in the analytic partition by clause. As André Schild suggested, you can use a hash; here passing the integer value 3, which is the equivalent of dbms_crypto.hash_sh1 (though in theory that could change in a future release!):
select id from (
select ID, ROW_NUMBER() over (partition by dbms_crypto.hash(LATEST_RECEIPT, 3)
order by ID) rownumber
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID != t1.ID
-- or if ID isn't unique: and t2.ROWID != t1.ROWID
)
)
where rownumber > 1;
It is of course possible to get a hash collision, and if that happened - you had two latest_receipt values which both appeared more than once and both hashed to the same value - then you could get too many rows back. That seems pretty unlikely, but it's something to consider.
So rather than ordering you can only look for rows which have the same lastest_receipt and a lower ID:
select ID
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID < t1.ID
);
Again that assumes ID is unique. If it isn't then you could still use rowid instead, but you would have less control over which rows were found - the lowest rowid isn't necessarily the lowest ID. Presumably you're using this to dine rows to delete. If you actually don't mind which row you keep and which you delete then you could still do:
and t2.ROWID < t1.ROWID
But since you are currently ordering that probably isn't acceptable, and hashing might be preferable, despite the small risk.

Slow SQL Server Query due to calculations?

I am responsible for an old time recording system which was written in ASP.net Web Forms using ADO.Net 2.0 for persistence.
Basically the system allows users to add details about a piece of work they are doing, the amount of hours they have been assigned to complete the work as well as the amount of hours they have spent on the work to date.
The system also has a reporting facility with the reports based on SQL queries. Recently I have noticed that many reports being run from the system have become very slow to execute. The database has around 11 tables, and it doesn’t store too much data. 27,000 records is the most records any one table holds, with the majority of tables well below even 1,500 records.
I don’t think the issue is therefore related to large volumes of data, I think it is more to do with poorly constructed sql queries and possibly even the same applying to the database design.
For example, there are queries similar to this
#start_date datetime,
#end_date datetime,
#org_id int
select distinct t1.timesheet_id,
t1.proposal_job_ref,
t1.work_date AS [Work Date],
consultant.consultant_fname + ' ' + consultant.consultant_lname AS [Person],
proposal.proposal_title AS [Work Title],
t1.timesheet_time AS [Hours],
--GET TOTAL DAYS ASSIGNED TO PROPOSAL
(select sum(proposal_time_assigned.days_assigned)-- * 8.0)
from proposal_time_assigned
where proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref )
as [Total Days Assigned],
--GET TOTAL DAYS SPENT ON THE PROPOSAL SINCE 1ST APRIL 2013
(select isnull(sum(t2.timesheet_time / 8.0), '0')
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013' )
as [Days Spent Since 1st April 2013],
--GET TOTAL DAYS REMAINING ON THE PROPOSAL
(select sum(proposal_time_assigned.days_assigned)
from proposal_time_assigned
where proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref )
-
(select sum(t2.timesheet_time / 8.0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
) as [Total Days Remaining]
from timesheet_entries t1,
consultant,
proposal,
proposal_time_assigned
where (proposal_time_assigned.consultant_id = consultant.consultant_id)
and (t1.proposal_job_ref = proposal.proposal_ref_code)
and (proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref)
and (t1.code_id = #org_id) and (t1.work_date >= #start_date) and (t1.work_date <= #end_date)
and (t1.proposal_job_ref <> '0')
order by 2, 3
Which are expected to return data for reports. I am not even sure if anyone can follow what is happening in the query above, but basically there are quite a few calculations happening, i.e., dividing, multiplying, substraction. I am guessing this is what is slowing down the sql queries.
I suppose my question is, can anyone even make enough sense of the query above to even suggest how to speed it up.
Also, should calculations like the ones mentioned above ever been carried out in an sql query? Or should the this be done within code?
Any help would be really appreciated with this one.
Thanks.
based on the information given i had to do an educated guess about certain table relationships. if you post the table structures, indexes etc... we can complete remaining columns in to this query.
As of right now this query calculates "Days Assigned", "Days Spent" and "Days Remaining"
for the KEY "timesheet_id and proposal_job_ref"
what we have to see is how "work_date", "timesheet_time", "[Person]", "proposal_title" is associate with that.
are these calculation by person and Proposal_title as well ?
you can use sqlfiddle to provide us the sample data and output so we can work off the meaning full data instead doing guesses.
SELECT
q1.timesheet_id
,q1.proposal_job_ref
,q1.[Total Days Assigned]
,q2.[Days Spent Since 1st April 2013]
,(
q1.[Total Days Assigned]
-
q2.[Days Spent Since 1st April 2013]
) AS [Total Days Remaining]
FROM
(
select
t1.timesheet_id
,t1.proposal_job_ref
,sum(t4.days_assigned) as [Total Days Assigned]
from tbl1.timesheet_entries t1
JOIN tbl1.proposal t2
ON t1.proposal_job_ref=t2.proposal_ref_code
JOIN tbl1.proposal_time_assigned t4
ON t4.proposal_ref_code = t1.proposal_job_ref
JOIN tbl1.consultant t3
ON t3.consultant_id=t4.consultant_id
WHERE t1.code_id = #org_id
AND t1.work_date BETWEEN #start_date AND #end_date
AND t1.proposal_job_ref <> '0'
GROUP BY t1.timesheet_id,t1.proposal_job_ref
)q1
JOIN
(
select
tbl1.timesheet_id,tbl1.proposal_job_ref
,isnull(sum(tbl1.timesheet_time / 8.0), '0') AS [Days Spent Since 1st April 2013]
from tbl1.timesheet_entries tbl1
JOIN tbl1.timesheet_entries tbl2
ON tbl1.proposal_job_ref=tbl2.proposal_job_ref
AND tbl2.work_date <= tbl1.work_date
AND tbl2.work_date >= '01/04/2013'
WHERE tbl1.code_id = #org_id
AND tbl1.work_date BETWEEN #start_date AND #end_date
AND tbl1.proposal_job_ref <> '0'
GROUP BY tbl1.timesheet_id,tbl1.proposal_job_ref
)q2
ON q1.timesheet_id=q2.timesheet_id
AND q1.proposal_job_ref=q2.proposal_job_ref
The Problem what i see in your query is :
1> Alias name is not provided for the Tables.
2> Subqueries are used (which are execution cost consuming) instead of WITH clause.
if i would write your query it will look like this :
select distinct t1.timesheet_id,
t1.proposal_job_ref,
t1.work_date AS [Work Date],
c1.consultant_fname + ' ' + c1.consultant_lname AS [Person],
p1.proposal_title AS [Work Title],
t1.timesheet_time AS [Hours],
--GET TOTAL DAYS ASSIGNED TO PROPOSAL
(select sum(pta2.days_assigned)-- * 8.0)
from proposal_time_assigned pta2
where pta2.proposal_ref_code = t1.proposal_job_ref )
as [Total Days Assigned],
--GET TOTAL DAYS SPENT ON THE PROPOSAL SINCE 1ST APRIL 2013
(select isnull(sum(t2.timesheet_time / 8.0), 0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013' )
as [Days Spent Since 1st April 2013],
--GET TOTAL DAYS REMAINING ON THE PROPOSAL
(select sum(pta2.days_assigned)
from proposal_time_assigned pta2
where pta2.proposal_ref_code = t1.proposal_job_ref )
-
(select sum(t2.timesheet_time / 8.0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
) as [Total Days Remaining]
from timesheet_entries t1,
consultant c1,
proposal p1,
proposal_time_assigned pta1
where (pta1.consultant_id = c1.consultant_id)
and (t1.proposal_job_ref = p1.proposal_ref_code)
and (pta1.proposal_ref_code = t1.proposal_job_ref)
and (t1.code_id = #org_id) and (t1.work_date >= #start_date) and (t1.work_date <= #end_date)
and (t1.proposal_job_ref <> '0')
order by 2, 3
Check above query for any indexing option & number of records to be processed from each table.
Check your databases for indexes on the following tables (if those columns are not indexed, then start by indexing each).
proposal_time_assigned.proposal_ref_code
proposal_time_assigned.consultant_id
timesheet_entries.code_id
timesheet_entries.proposal_job_ref
timesheet_entries.work_date
consultant.consultant_id
proposal.proposal_ref_code
Without all of these indexes, nothing will improve this query.
The only thing in your query that would affect performance is the way you are filtering the [work_date]. Your current syntax causes a table scan:
--bad
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013'
This syntax uses an index (if it exists) and would be much faster:
--better
and t2.work_date between t1.work_date and '01/04/2013'

Deleting Invalid Duplicate Rows in SQL

I have a table which stores the check-in times of employees through Time Machine on the basis of a username. If an employee punches multiple times then there would be multiple records of his check-ins which would only have a time difference of few seconds in between. Obviously only the first record is valid. All the other entries are invalid and must be deleted from the Table. How can i do it if i can select all the checkin records of an employee for the current date?
The Data in the db is as follows.
Username Checktime CheckType
HRA001 7/29/2012 8:16:44 AM Check-In
HRA001 7/29/2012 8:16:46 AM Check-In
HRA001 7/29/2012 8:16:50 AM Check-In
HRA001 7/29/2012 8:16:53 AM Check-In
Try this:
;WITH users_CTE as (
select rank() over (partition by Username order by Checktime) as rnk from users
)
DELETE FROM users_CTE where rnk <> 1
--For your second requirement try this query
;WITH users_CTE as (
select *,rank() over (partition by Username order by Checktime) as rnk from users
)
,CTE2 as (select Username,MIN(CheckTime) as minTime,DATEADD(mi,1,MIN(CheckTime)) as maxTime from users_CTE
group by Username)
delete from users where Checktime in(
select c1.Checktime from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
where c2.Username is not null and c1.Username in(
select c1.Username from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
group by c1.Username,c2.Username
having COUNT(*) > 1))
--For your changed requirements pls check this query below
alter table users add flag varchar(2)
;WITH users_CTE as (
select *,rank() over (partition by Username order by Checktime) as rnk from users
)
,CTE2 as (select Username,MIN(CheckTime) as minTime,DATEADD(mi,1,MIN(CheckTime)) as maxTime from users_CTE
group by Username)
update u SET u.flag = 'd' from users_CTE u inner join (
select c1.Checktime from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
where c2.Username is not null and c1.Username in(
select c1.Username from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
group by c1.Username,c2.Username
having COUNT(*) > 1)) a
on u.Checktime=a.Checktime
--Check the latest query with DeletFlag
;WITH users_CTE as
(
select *,row_number() over (partition by Username order by Checktime) as row from users
)
,CTE as(
select row,Username,Checktime,CheckType,0 as totalSeconds,'N' as Delflag from users_CTE where row=1
union all
select t.row,t.Username,t.Checktime,t.CheckType,CASE WHEN (c.totalSeconds + DATEDIFF(SECOND,c.Checktime,t.Checktime)) >= 60 then 0 else (c.totalSeconds + DATEDIFF(SECOND,c.Checktime,t.Checktime)) end as totalSeconds,
CASE WHEN (c.totalSeconds + DATEDIFF(SECOND,c.Checktime,t.Checktime)) >= 60 then 'N' else 'Y' end as Delflag
--CASE WHEN c.totalSeconds <= 60 then 'Y' else 'N' end as Delflag
from users_CTE t inner join CTE c
on t.row=c.row+1
)
select Username,Checktime,CheckType,Delflag from CTE
Why don't you verify the check-ins before inserting them into db. If there exists any check-in for this user, between this date and that date then do nothing else insert it
You should be able to order all records by time, subtract the latest time from the previous time per employee and, if the result is less than a certain threshold, delete the row(s) with the most recent time.
You could try and RANK the records by checkin time and then delete all the records for each employee for each day which have RANK greater than 1.
Try this query: Delete from employee where employee.checkin in (select checkin from employee where count(checkin)>1);
http://codesimplified.com/2010/10/18/remove-duplicate-records-from-the-database-table/
Hope this will helps you.
DELETE FROM timesheet
WHERE timesheetRecordId <>(
SELECT TOP 1 timesheetRecordId from timesheet
WHERE checkInDate=todaysDate AND employeeId=empId ORDER BY checkInTime ASC
)
AND checkInDate=today's date AND empolyeeId=empId;
I don't think you can specify a Target Table, from a Delete statement, in a Subquery of that same statement. So you can't do it with one single Delete statement.
What you can do is write a stored procedure. In your Stored Procedure you should create a Temporary Table containing the PKs returned by this Query:
select cht.pkey
from CheckTimeTable as cht
where exists ( select pkey
from CheckTimeTable
where username = cht.userName
and checkType = 'check-IN'
and Checktime >= subtime(cht.Checktime, '0 0:0:15.000000')
and Checktime < cht.Checktime);
Then write another statement to delete those PKs from your original table, CheckTimeTable.
Note that the query above is for MySQL, so you'll need to find the way to subtract 15 seconds from a timestamp for your DBMS. In MySQL it's done like this:
subtime(cht.Checktime, '0 0:0:15.000000')
This query will return whichever CheckTime record that has another CheckTime record from the same user, with the type Check-In, and within 15 seconds earlier than its own checktime.

Preventing Max function from using timestamp as part of criteria on a date column in PL/SQL

If I query:
select max(date_created) date_created
on a datefield in PL/SQL (Oracle 11g), and there are records that were created on the same date but at different times, Max() returns only the latest times on that date. What I would like to do is have the times be ignored and return ALL records that match the max date, regardless of their associated timestamp in that column. What is the best practice for doing this?
Edit: what I'm looking to do is return all records for the most recent date that matches my criteria, regardless of varying timestamps for that day. Below is what I'm doing now and it only returns records from the latest date AND time on that date.
SELECT r."ID",
r."DATE_CREATED"
FROM schema.survey_response r
JOIN
(SELECT S.CUSTOMERID ,
MAX (S.DATE_CREATED) date_created
FROM schema.SURVEY_RESPONSE s
WHERE S.CATEGORY IN ('Yellow', 'Blue','Green')
GROUP BY CUSTOMERID
) recs
ON R.CUSTOMERID = recs.CUSTOMERID
AND R.DATE_CREATED = recs.date_created
WHERE R.CATEGORY IN ('Yellow', 'Blue','Green')
Final Edit: Got it working via the query below.
SELECT r."ID",
r."DATE_CREATED"
FROM schema.survey_response r
JOIN
(SELECT S.CUSTOMERID ,
MAX (trunc(S.DATE_CREATED)) date_created
FROM schema.SURVEY_RESPONSE s
WHERE S.CATEGORY IN ('Yellow', 'Blue','Green')
GROUP BY CUSTOMERID
) recs
ON R.CUSTOMERID = recs.CUSTOMERID
AND trunc(R.DATE_CREATED) = recs.date_created
WHERE R.CATEGORY IN ('Yellow', 'Blue','Green')
In Oracle, you can get the latest date ignoring the time
SELECT max( trunc( date_created ) ) date_created
FROM your_table
You can get all rows that have the latest date ignoring the time in a couple of ways. Using analytic functions (preferrable)
SELECT *
FROM (SELECT a.*,
rank() over (order by trunc(date_created) desc) rnk
FROM your_table a)
WHERE rnk = 1
or the more conventional but less efficient
SELECT *
FROM your_table
WHERE trunc(date_created) = (SELECT max( trunc(date_created) )
FROM your_table)

Resources