SQLITE results inconsistency - sqlite

I am summarising the outputs of a survey, stored in a sqlite database file, and have a view defined as follows - this is meant to show entries in the valid response view where the respondent has indicated that EITHER:
(a) they are meeting the requirements already; OR,
(b) they aren't meeting all requirements, but the associated actions in place will be complete by the end of the year (31/12/2020):
CREATE VIEW complete_dec20 AS
SELECT *
FROM valid_response
WHERE
(impact_answer NOT IN ("Fully","Yes","N/A") AND
td__update_leg_doc <= "2020-12-31" AND
td__update_proc <= "2020-12-31" AND
td__update_op_proc <= "2020-12-31" AND
td__update_tech <= "2020-12-31" AND
td__training <= "2020-12-31") OR
impact_answer IN ("Fully","Yes","N/A")
The records included in the view are correct, however, when I query the results from the valid_response view that are not included in the view, there are some strange results:
SELECT *
FROM valid_response
WHERE id NOT IN (SELECT id FROM complete_dec20);
e.g.
id,impact_answer,td__update_leg_doc,td__update_proc,td__update_op_proc,td__update_tech,td__training
7,Partially,2020-12-31,,,,
Based on the date of 2020-12-31 and answer of 'Partially', this should be in the complete_dec20 view.
Can you explain why it isn't / what I'm missing?

Based on the date of 2020-12-31 and answer of 'Partially', this should
be in the complete_dec20 view
This should be in the complete_dec20 view only if all of these conditions are true:
td__update_leg_doc <= '2020-12-31' AND
td__update_proc <= '2020-12-31' AND
td__update_op_proc <= '2020-12-31' AND
td__update_tech <= '2020-12-31' AND
td__training <= '2020-12-31'
Are they?
I don't think so.
If they were true then the id would be returned by complete_dec20.
Also, the WHERE clause of complete_dec20 can be a bit simpler because there is no need to check impact_answer NOT IN ('Fully','Yes','N/A'):
CREATE VIEW complete_dec20 AS
SELECT *
FROM valid_response
WHERE impact_answer IN ('Fully','Yes','N/A')
OR
(
td__update_leg_doc <= '2020-12-31' AND
td__update_proc <= '2020-12-31' AND
td__update_op_proc <= '2020-12-31' AND
td__update_tech <= '2020-12-31' AND
td__training <= '2020-12-31'
)
Or even simpler with the function MAX():
CREATE VIEW complete_dec20 AS
SELECT *
FROM valid_response
WHERE impact_answer IN ('Fully','Yes','N/A')
OR
MAX(
td__update_leg_doc,
td__update_proc,
td__update_op_proc,
td__update_tech,
td__training
) <= '2020-12-31'

Related

Generating range of months using multi-level hierarchies

I want to filter the months of [ComptaEcriture Date] by selecting only the months from [ComptaPlanId].[ComptaDateDebut] to [ComptaPlanId].[ComptaDateFin], but since
[ComptaDateDebut] and [ComptaDateFin] are not from the same level and bot are not from the same dimension as [ComptaEcriture Date].[ComptaEcriture Date].[Month], I don't know how to achieve that.
If I could generate a range of months that would be great. My dimensions are as follows:
Assuming you're testing that PlanId is no the All member, you can use isAll MDX+ function for this.
For the set, we will combine the Filter function with a Declared function, even though we could put it all the code in the filter. It looks as :
WITH
FUNCTION inRange(Value _date,Value _start, Value _end) AS _start <= _date AND _date <= _end
SET myDates as Filter( [Date].[Date].[Month] as t, inRange(t.current.key, DateTime(2015,6,1), DateTime(2017,1,1) ) )
SELECT
myDates on 0
FROM [Cube]
And using the compact and faster version :
SELECT
Filter( [Date].[Date].[Month] as t, DateTime(2015,6,1) <= t.current.key AND t.current.key <= DateTime(2017,1,1) ) on 0
FROM [Cube]
Using the members :
WITH
FUNCTION inRange(Value _date,Value _start, Value _end) AS _start <= _date AND _date <= _end
SET myDates as Filter( [Date].[Date].[Month] as t,
inRange(t.current.key, [ComptaDateDebut].currentmember.key, [ComptaDateFin].currentmember.key )
)
SELECT
myDates on 0
FROM [Cube]
You can use contextMember instead of currentMember that check also in the slicer (FILTER BY or subselect)

SQL - Using logical operators in a UDF case statement

So it took a while for me to figure out how to create my first UDF but after I fixed it, I figured my next one would be a piece of cake. Unfortunately, it hasn't been the case. I'm pulling a field (ORIG_CLAIM, float) and I want to categorize that number. Here's my code:
CREATE FUNCTION [dbo].[fnOC_LEVEL](#ORIG_CLAIM float)
RETURNS nvarchar(255)
AS
BEGIN
DECLARE #result as varchar(255);
SELECT #result = case #ORIG_CLAIM
when < 1000 then 'A_Under 1000'
when >= 1000 and <= 4999.99 then 'B_1000-4999'
when >= 5000 and <= 7499.99 then 'C_5000-7499'
when >= 7500 and <= 9999.99 then 'D_7500-9999'
when >= 10000 and <= 14999.99 then 'E_10000-14999'
when >= 15000 and <= 19999.99 then 'F_15000-19999'
when >= 20000 then 'G_Over 20000'
END
RETURN #result
END
GO
I'm getting the error "Incorrect syntax near '<'". Can anyone wee what I might be doing wrong?
I think you may have to specify the comparison values as float. For example:
when < 1.0E3 then 'A_Under 1000'
when >= 1.0E3 and <= 4.99999E3 then 'B_1000-4999'
etc.

Slow SQL Server Query due to calculations?

I am responsible for an old time recording system which was written in ASP.net Web Forms using ADO.Net 2.0 for persistence.
Basically the system allows users to add details about a piece of work they are doing, the amount of hours they have been assigned to complete the work as well as the amount of hours they have spent on the work to date.
The system also has a reporting facility with the reports based on SQL queries. Recently I have noticed that many reports being run from the system have become very slow to execute. The database has around 11 tables, and it doesn’t store too much data. 27,000 records is the most records any one table holds, with the majority of tables well below even 1,500 records.
I don’t think the issue is therefore related to large volumes of data, I think it is more to do with poorly constructed sql queries and possibly even the same applying to the database design.
For example, there are queries similar to this
#start_date datetime,
#end_date datetime,
#org_id int
select distinct t1.timesheet_id,
t1.proposal_job_ref,
t1.work_date AS [Work Date],
consultant.consultant_fname + ' ' + consultant.consultant_lname AS [Person],
proposal.proposal_title AS [Work Title],
t1.timesheet_time AS [Hours],
--GET TOTAL DAYS ASSIGNED TO PROPOSAL
(select sum(proposal_time_assigned.days_assigned)-- * 8.0)
from proposal_time_assigned
where proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref )
as [Total Days Assigned],
--GET TOTAL DAYS SPENT ON THE PROPOSAL SINCE 1ST APRIL 2013
(select isnull(sum(t2.timesheet_time / 8.0), '0')
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013' )
as [Days Spent Since 1st April 2013],
--GET TOTAL DAYS REMAINING ON THE PROPOSAL
(select sum(proposal_time_assigned.days_assigned)
from proposal_time_assigned
where proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref )
-
(select sum(t2.timesheet_time / 8.0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
) as [Total Days Remaining]
from timesheet_entries t1,
consultant,
proposal,
proposal_time_assigned
where (proposal_time_assigned.consultant_id = consultant.consultant_id)
and (t1.proposal_job_ref = proposal.proposal_ref_code)
and (proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref)
and (t1.code_id = #org_id) and (t1.work_date >= #start_date) and (t1.work_date <= #end_date)
and (t1.proposal_job_ref <> '0')
order by 2, 3
Which are expected to return data for reports. I am not even sure if anyone can follow what is happening in the query above, but basically there are quite a few calculations happening, i.e., dividing, multiplying, substraction. I am guessing this is what is slowing down the sql queries.
I suppose my question is, can anyone even make enough sense of the query above to even suggest how to speed it up.
Also, should calculations like the ones mentioned above ever been carried out in an sql query? Or should the this be done within code?
Any help would be really appreciated with this one.
Thanks.
based on the information given i had to do an educated guess about certain table relationships. if you post the table structures, indexes etc... we can complete remaining columns in to this query.
As of right now this query calculates "Days Assigned", "Days Spent" and "Days Remaining"
for the KEY "timesheet_id and proposal_job_ref"
what we have to see is how "work_date", "timesheet_time", "[Person]", "proposal_title" is associate with that.
are these calculation by person and Proposal_title as well ?
you can use sqlfiddle to provide us the sample data and output so we can work off the meaning full data instead doing guesses.
SELECT
q1.timesheet_id
,q1.proposal_job_ref
,q1.[Total Days Assigned]
,q2.[Days Spent Since 1st April 2013]
,(
q1.[Total Days Assigned]
-
q2.[Days Spent Since 1st April 2013]
) AS [Total Days Remaining]
FROM
(
select
t1.timesheet_id
,t1.proposal_job_ref
,sum(t4.days_assigned) as [Total Days Assigned]
from tbl1.timesheet_entries t1
JOIN tbl1.proposal t2
ON t1.proposal_job_ref=t2.proposal_ref_code
JOIN tbl1.proposal_time_assigned t4
ON t4.proposal_ref_code = t1.proposal_job_ref
JOIN tbl1.consultant t3
ON t3.consultant_id=t4.consultant_id
WHERE t1.code_id = #org_id
AND t1.work_date BETWEEN #start_date AND #end_date
AND t1.proposal_job_ref <> '0'
GROUP BY t1.timesheet_id,t1.proposal_job_ref
)q1
JOIN
(
select
tbl1.timesheet_id,tbl1.proposal_job_ref
,isnull(sum(tbl1.timesheet_time / 8.0), '0') AS [Days Spent Since 1st April 2013]
from tbl1.timesheet_entries tbl1
JOIN tbl1.timesheet_entries tbl2
ON tbl1.proposal_job_ref=tbl2.proposal_job_ref
AND tbl2.work_date <= tbl1.work_date
AND tbl2.work_date >= '01/04/2013'
WHERE tbl1.code_id = #org_id
AND tbl1.work_date BETWEEN #start_date AND #end_date
AND tbl1.proposal_job_ref <> '0'
GROUP BY tbl1.timesheet_id,tbl1.proposal_job_ref
)q2
ON q1.timesheet_id=q2.timesheet_id
AND q1.proposal_job_ref=q2.proposal_job_ref
The Problem what i see in your query is :
1> Alias name is not provided for the Tables.
2> Subqueries are used (which are execution cost consuming) instead of WITH clause.
if i would write your query it will look like this :
select distinct t1.timesheet_id,
t1.proposal_job_ref,
t1.work_date AS [Work Date],
c1.consultant_fname + ' ' + c1.consultant_lname AS [Person],
p1.proposal_title AS [Work Title],
t1.timesheet_time AS [Hours],
--GET TOTAL DAYS ASSIGNED TO PROPOSAL
(select sum(pta2.days_assigned)-- * 8.0)
from proposal_time_assigned pta2
where pta2.proposal_ref_code = t1.proposal_job_ref )
as [Total Days Assigned],
--GET TOTAL DAYS SPENT ON THE PROPOSAL SINCE 1ST APRIL 2013
(select isnull(sum(t2.timesheet_time / 8.0), 0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013' )
as [Days Spent Since 1st April 2013],
--GET TOTAL DAYS REMAINING ON THE PROPOSAL
(select sum(pta2.days_assigned)
from proposal_time_assigned pta2
where pta2.proposal_ref_code = t1.proposal_job_ref )
-
(select sum(t2.timesheet_time / 8.0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
) as [Total Days Remaining]
from timesheet_entries t1,
consultant c1,
proposal p1,
proposal_time_assigned pta1
where (pta1.consultant_id = c1.consultant_id)
and (t1.proposal_job_ref = p1.proposal_ref_code)
and (pta1.proposal_ref_code = t1.proposal_job_ref)
and (t1.code_id = #org_id) and (t1.work_date >= #start_date) and (t1.work_date <= #end_date)
and (t1.proposal_job_ref <> '0')
order by 2, 3
Check above query for any indexing option & number of records to be processed from each table.
Check your databases for indexes on the following tables (if those columns are not indexed, then start by indexing each).
proposal_time_assigned.proposal_ref_code
proposal_time_assigned.consultant_id
timesheet_entries.code_id
timesheet_entries.proposal_job_ref
timesheet_entries.work_date
consultant.consultant_id
proposal.proposal_ref_code
Without all of these indexes, nothing will improve this query.
The only thing in your query that would affect performance is the way you are filtering the [work_date]. Your current syntax causes a table scan:
--bad
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013'
This syntax uses an index (if it exists) and would be much faster:
--better
and t2.work_date between t1.work_date and '01/04/2013'

Working with date ranges (Classic ASP and SQL)

I have to implement a solution where two date ranges can overlap each other. within the overlapped dates, I have to count how many days overlap each other. Once I know the overlapped days I can calculate a total figure based on the price that's attached per day.
A scenario would be that
A customer is booking a hotel
Customer booking dates - 17/02/2011 to 26/02/2011
Normal price (All year) - 01/01/2011 - 31/12/2011 (price per day :$30.00)
Special Offer 1 dates - 01/01/2011 to 19/02/2011 (price per day :$20.00)
Special Offer 2 dates - 17/02/2011 to 24/02/2011 (price per day :$10.00)
In the above scenario, the proposed algorithm should work out the cheapest offer that the date ranges overlap and work out the price for the booking. If there is no special offer available it uses the normal price.
So for the first two days the system should get the price from "special offer 1" as it's the cheapest available price. Next 5 days should be "Special offer 2 price" and for the next 2 days it'll be normal price.
I'd be grateful to see both SQL(using MS-SQL Server) or Code base answers to get the diffrenet views.
I hope the question is clear and looking foward to see the answers.
Many thanks in advance
Using the standard trick of using an auxiliary calendar table, it is simply a case of joins and grouping to get the best price each day:
SELECT C.dt, MIN(price) AS best_price
FROM Prices P
INNER JOIN Calendar C
ON C.dt >= P.price_start_date
AND C.dt < P.price_end_date
INNER JOIN CustomerBooking B
ON C.dt >= B.booking_start_date
AND C.dt < B.booking_end_date
GROUP
BY C.dt;
The same query as above, including sample data using CTEs:
WITH Prices (price_start_date, price_end_date, narrative, price)
AS
(
SELECT CAST(start_date AS Date), CAST(end_date AS Date), narrative, price
FROM (
VALUES ('2011-01-01T00:00:00', '2011-12-31T00:00:00', 'Normal price', 30),
('2011-01-01T00:00:00', '2011-02-21T00:00:00', 'Special Offer 1', 20),
('2011-02-19T00:00:00', '2011-02-24T00:00:00', 'Special Offer 2', 10)
) AS T (start_date, end_date, narrative, price)
),
CustomerBooking (booking_start_date, booking_end_date)
AS
(
SELECT CAST(start_date AS Date), CAST(end_date AS Date)
FROM (
VALUES ('2011-02-17T00:00:00', '2011-02-26T00:00:00')
) AS T (start_date, end_date)
)
SELECT C.dt, MIN(price) AS best_price
FROM Prices P
INNER JOIN Calendar C
ON C.dt >= P.price_start_date
AND C.dt < P.price_end_date
INNER JOIN CustomerBooking B
ON C.dt >= B.booking_start_date
AND C.dt < B.booking_end_date
GROUP
BY C.dt;
Let's supose that for each day you should apply lowest price.
create function price ( #fromDate date, #toDate date) returns money
as
begin
declare #iterator_day date
declare #total money
set #total = 0
set #iterator_day = #fromDate
WHILE #iterator_day < = #toDate
begin
select #total = #total + min( price )
from offers
where #iterator_day between offers.fromFay and offers.toDay
set #iterator_day = DATEADD (day , 1 , #iterator_day )
end
return #total
end
then you can call function in your query:
select
b.fromDay, b.toDay, dbo.price( b.fromDay, b.toDay )
from
booking b
I've only used ASP.net 4.0, but I can offer some SQL will give you the price for a given date:
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '18/2/11') AND (EndDate >= '18/2/11')
From your application you could build the query to be something like this:
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '17/2/11') AND (EndDate >= '17/2/11');
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '18/2/11') AND (EndDate >= '18/2/11');
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '19/2/11') AND (EndDate >= '19/2/11');
This would return a dataset of tables containing a single value for the minimum price for that date (in the same order as your query)
Sounds like a good job for a Stored Procedure...
Your problem here is that you're got multiple overlapping time periods. You either need to constrain the problem slightly, or remodel the data slightly. (To get desirable performance.)
Option 1 - Constraints
A data set of 'normal' prices - that never overlap with each other
A data set of 'special' prices - that also never overlap with each other
Every bookable date has a 'normal' price
Every bookable date has a 'special' price (EVEN if it's NULL to mean 'no special price')
The last constraint is the strangest one. But it's needed to make the simple join work. When comparing date ranges, it's alot easier to form the query if the two sets of ranges are gapless and have no overlaps inside them.
This means that you should now be able to work it out with just a few joins...
SELECT
CASE WHEN [sp].started > [np].started THEN [sp].started ELSE [np].started END AS [started]
CASE WHEN [sp].expired < [np].expired THEN [sp].expired ELSE [np].expired END AS [expired]
CASE WHEN [sp].price < [np].price THEN [sp].price ELSE [np].price END AS [price]
FROM
normal_prices AS [np]
LEFT JOIN
special_prices AS [sp]
ON [sp].started < [np].expired
AND [sp].expired > [np].started
AND [sp].started >= (SELECT ISNULL(MAX(started),0) FROM special_prices WHERE started <= [np].started)
-- The third condition is an optimisation for large data-sets.
WHERE
[np].started < #expired
AND [np].expired > #started
-- Note: Inclusive StartDates, Exlusive EndDate
-- For example, "all of Jan" would be "2011-01-01" to "2011-02-01"
Option 2 - Re-Model
This one is often the fastest in my experience; you increase the amount of space being used, and gain a simpler faster query...
Table Of Prices, stored by DAY rather than period...
- calendar_date
- price_code
- price
SELECT
calendar_date,
MIN(price)
FROM
prices
WHERE
calendar_date >= #started
AND calendar_date < #expired
Or, if you needed the price_code as well...
WITH
ordered_prices AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY calendar_date ORDER BY price ASC, price_code) AS price_rank,
*
FROM
prices
)
SELECT
calendar_date,
price_code,
price
FROM
ordered_prices
WHERE
calendar_date >= #started
AND calendar_date < #expired

Count distinct in MDX (convert from SQL query)

SELECT COUNT (DISTINCT S.PK_Submission)
FROM Fact_Submission FS, Submission S
WHERE
FS.FK_Submission = S.PK_Submission
AND FS.FK_Submission_Date >= 20100101
AND FS.FK_Submission_Date <= 20101231
I've tried this:
SELECT
{[Measures].[Fact Submission Count]} ON AXIS(0),
Distinct({[Submission].[PK Submission] }) ON AXIS(1)
FROM [Submission]
WHERE
([Date].[Calendar Year].[2010])
but the result is the same
any idea how to write this in MDX? I'm pretty new at this so still haven't figured it out.
This is correct answer:
WITH SET MySet AS
{[Measures].[Fact Submission Count]}
*
DISTINCT({ EXCEPT([Submission].[PK Submission].Members, [Submission].[PK Submission].[All]) })
MEMBER MEASURES.DistinctSubmissionCount AS
DISTINCTCOUNT(MySet)
SELECT {MEASURES.DistinctSubmissionCount} ON 0
FROM [Submission]
WHERE
([Date].[Calendar Year].[2010])
I have excluded "All" row because it's also being counted by COUNT function so I always had +1.

Resources