Get a count of rows that meet condition - sqlite

SQLITE3
Task: get a data set that contains the following data - SEE NOTES BESIDE COLUMNS
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
Count() AS SpeederCount -- ?? WTF? [pseudo constraints: if Speed > Speedlimit then +1]
FROM RawSpeedLane AS sl
INNER JOIN
SpeedLaneSearchData AS slsd ON slsd.ParentId = sl.Id
INNER JOIN
Projects AS p ON p.ProjectId = sl.ProjectId
WHERE sl.ProjectId = 72
GROUP BY RoundedDate;
The SQL above is currently gives me all the data I need, EXECPT for the last column.
This last column is supposed to be the count of records where that pass specific criteria. The only way I have found to successfully do this is to build a sub query... Cool? okay, but the problem is the sub query takes 4 minutes to run because well... I suck at SQL :P No matter how many different ways I've tried to write it, it still takes forever.
Here is the long, but working version.
SELECT DISTINCT RoundedDate,
Count() AS Count,
CAST (avg(Speed) AS INT) AS AverageSpeed,
(
SELECT count()
FROM RawSpeedLane AS slr
WHERE slr.ProjectId = 72 AND
datetime( ( (strftime('%s', Start) - (strftime('%M', Start) * 60 + strftime('%S', Start) ) ) ), 'unixepoch') = sl.RoundedDate AND
Speed > p.SpeedLimit
)
AS SpeederCount
FROM SpeedLaneReportDataView AS sl
INNER JOIN
Projects AS p ON p.ProjectId = sl.ProjectId
WHERE sl.ProjectId = 72
GROUP BY RoundedDate;
I currently just tried this for the last column
(select Count() where sl.Speed > p.SpeedLimit)
but as expected, i got 1s and 0s im not really sure on what to do here. Any hints or help that lead me in the right direction is very much appreciated.

I don't think SQLite has an IIF but CASE works.
This is a response to Backs answer, but I can't comment yet.
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
SUM(CASE WHEN Speed > SpeedLimit THEN 1 ELSE 0 END) AS SpeederCount
FROM RawSpeedLane AS sl

With SUM and IIF:
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
SUM(IIF(Speed > SpeedLimit, 1, 0)) AS SpeederCount
FROM RawSpeedLane AS sl

Related

SQL: Sum based on specified dates

Thanks again for the help everyone. I went with the script below...
SELECT beginning, end,
(SELECT SUM(sale) FROM sales_log WHERE date BETWEEN beginning AND `end` ) AS sales
FROM performance
and I added a salesperson column to both the performance table and sales_log but it winds up crashing DB Browser. What is the issue here? New code below:
SELECT beginning, end, salesperson
(SELECT SUM(sale) FROM sales_log WHERE (date BETWEEN beginning AND end) AND sales_log.salesperson = performance.salesperson ) AS sales
FROM performance
I believe that the following may do what you wish or be the basis for what you wish.
WITH sales_log_cte AS
(
SELECT substr(date,(length(date) -3),4)||'-'||
CASE WHEN length(replace(substr(date,instr(date,'/')+1,2),'/','')) < 2 THEN '0' ELSE '' END
||replace(substr(date,instr(date,'/')+1,2),'/','')||'-'||
CASE WHEN length(substr(date,1,instr(date,'/') -1)) < 2 THEN '0' ELSE '' END||substr(date,1,instr(date,'/') -1) AS date,
CAST(sale AS REAL) AS sale
FROM sales_log
),
performance_cte AS
(
SELECT substr(beginning,(length(beginning) -3),4)||'-'||
CASE WHEN length(replace(substr(beginning,instr(beginning,'/')+1,2),'/','')) < 2 THEN '0' ELSE '' END
||replace(substr(beginning,instr(beginning,'/')+1,2),'/','')||'-'||
CASE WHEN length(substr(beginning,1,instr(beginning,'/') -1)) < 2 THEN '0' ELSE '' END||substr(beginning,1,instr(beginning,'/') -1)
AS beginning,
substr(`end`,(length(`end`) -3),4)||'-'||
CASE WHEN length(replace(substr(`end`,instr(`end`,'/')+1,2),'/','')) < 2 THEN '0' ELSE '' END
||replace(substr(`end`,instr(`end`,'/')+1,2),'/','')||'-'||
CASE WHEN length(substr(`end`,1,instr(`end`,'/') -1)) < 2 THEN '0' ELSE '' END||substr(`end`,1,instr(`end`,'/') -1)
AS `end`
FROM performance
)
SELECT beginning, `end` , (SELECT SUM(sale) FROM sales_log_cte WHERE date BETWEEN beginning AND `end` ) AS sales
FROM performance_cte
;
From your data this results in :-
As can be seen the bulk of the code is converting the dates into a format (i.e. YYYY-MM-DD) that is usable/recognisable by SQLite for the BETWEEN clause.
Date And Time Functions
I don't believe that you want a join between performance (preformance_cte after reformatting the dates) and sales_log (sales_log_cte) as this will be a cartesian product and then sum will sum all the results within the range.
The use of end as a column name is also awkward as it is a KEYWORD requiring it to be enclosed (` grave accents used in the above).
The above works by using 2 CTE's (Common Table Expresssions), which are temporary tables who'd life time is for the query in which they are used.
The first sales_log_cte is simply the sales_log table but with the date reformatted. The second, likewise, is simply the performace table with the dates reformatted.
If the tables already has suitable date formatting then all of the above could simply be :-
SELECT beginning, `end` , (SELECT SUM(sale) FROM sales_log WHERE date BETWEEN beginning AND `end` ) AS sales FROM performance;

Convert Columns to ROW

I am stuck with this requirement -
I have some data in the format
(Entries now show data for both periods (Jan. 2011) and (Feb. 2011) on the same line as apposed to appearing separately).
At the end I need to print the data using dbms_output.put_line command.
I am using Oracle 10.2g.
Oracle 10g does not have a PIVOT function but you can convert the rows of data into columns using an aggregate function with a CASE expression. The basic syntax would be:
select d.id,
d.site,
d.entrance,
sum(case when d.date = 'Jan.2011' then enters else 0 end) "Jan.2011",
sum(case when d.date = 'Feb.2011' then enters else 0 end) "Feb.2011"
from
(
select id, site, entrance, date, enters
from yourdata
) d
group by d.id, d.site, d.entrance;
Note: you can replace the subquery with yourdata with your current query.

Retrieve a table to tallied numbers, best way

I have query that runs as part of a function which produces a one row table full of counts, and averages, and comma separated lists like this:
select
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1) applicants,
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1
and applied != 'yes') s_applicants,
(select
LISTAGG(discipline, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_number
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines,
(select
LISTAGG(discipline_count, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_count
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines_count,
(select
round(avg(util.getawardstocols(application_id,'1','AWARD_NAME')), 2)
from vw_disp_details
where round = 2013
and rating = 1) average_award_score,
(select
round(avg(age))
from vw_disp_details
where round = 2013
and rating = 1) average_age
from dual;
Except that instead of 6 main sub-queries there are 23.
This returns something like this (if it were a CSV):
applicants | s_applicants | disciplines | disciplines_count | average_award_score | average_age
107 | 67 | "speed,accuracy,strength" | 3 | 97 | 23
Now I am programmatically swapping out the "rating = 1" part of the where clauses for other expressions. They all work rather quickly except for the "rating = 1" one which takes about 90 seconds to run and that is because the rating column in the vw_disp_details view is itself compiled by a sub-query:
(SELECT score
FROM read r,
eval_criteria_lookup ecl
WHERE r.criteria_id = ecl.criteria_id
AND r.application_id = a.lgo_application_id
AND criteria_description = 'Overall Score'
AND type = 'ABC'
) reader_rank
So when the function runs this extra query seems to slow everything down dramatically.
My question is, is there a better (more efficient) way to run a query like this that is basically just a series of counts and averages, and how can I refactor to optimize the speed so that the rating = 1 query doesn't take 90 seconds to run.
You could choose to MATERIALIZE the vw_disp_details VIEW. That would pre-calculate the value of the rating column. There are various options for how up-to-date a materialized view is kept, you would probably want to use the ON COMMIT clause so that vw_disp_details is always correct.
Have a look at the official documentation and see if that would work for you.
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6002.htm
Do all most of your queries in only one. Instead of doing:
select
(select (count(*) from my_tab) as count_all,
(select avg(age) from my_tab) as avg_age,
(select avg(mypkg.get_award(application_id) from my_tab) as_avg-app_id
from dual;
Just do:
select count(*), avg(age),avg(mypkg.get_award(application_id)) from my_tab;
And then, maybe you can do some union all for the other results. But this step all by itself should help.
I was able to solve this issue by doing two things: creating a new view that displayed only the results I needed, which gave me marginal gains in speed, and in that view moving the where clause of the sub-query that caused the lag into the where clause of the view and tacking on the result of the sub-query as column in the view. This still returns the same results thanks to the fact that there are always going to be records in the table the sub-query accessed for each row of the view query.
SELECT
a.application_id,
util.getstatus (a.application_id) status,
(SELECT score
FROM applicant_read ar,
eval_criteria_lookup ecl
WHERE ar.criteria_id = ecl.criteria_id
AND ar.application_id = a.application_id
AND criteria_description = 'Overall Score' //THESE TWO FIELDS
AND type = 'ABC' //ARE CRITERIA_ID = 15
) score
as.test_total test_total
FROM application a,
applicant_scores as
WHERE a.application_id = as.application_id(+);
Became
SELECT
a.application_id,
util.getstatus (a.application_id) status,
ar.score,
as.test_total test_total
FROM application a,
applicant_scores as,
applicant_read ar
WHERE a.application_id = as.application_id(+)
AND ar.application_id = a.application_id(+)
AND ar.criteria_id = 15;

Working with date ranges (Classic ASP and SQL)

I have to implement a solution where two date ranges can overlap each other. within the overlapped dates, I have to count how many days overlap each other. Once I know the overlapped days I can calculate a total figure based on the price that's attached per day.
A scenario would be that
A customer is booking a hotel
Customer booking dates - 17/02/2011 to 26/02/2011
Normal price (All year) - 01/01/2011 - 31/12/2011 (price per day :$30.00)
Special Offer 1 dates - 01/01/2011 to 19/02/2011 (price per day :$20.00)
Special Offer 2 dates - 17/02/2011 to 24/02/2011 (price per day :$10.00)
In the above scenario, the proposed algorithm should work out the cheapest offer that the date ranges overlap and work out the price for the booking. If there is no special offer available it uses the normal price.
So for the first two days the system should get the price from "special offer 1" as it's the cheapest available price. Next 5 days should be "Special offer 2 price" and for the next 2 days it'll be normal price.
I'd be grateful to see both SQL(using MS-SQL Server) or Code base answers to get the diffrenet views.
I hope the question is clear and looking foward to see the answers.
Many thanks in advance
Using the standard trick of using an auxiliary calendar table, it is simply a case of joins and grouping to get the best price each day:
SELECT C.dt, MIN(price) AS best_price
FROM Prices P
INNER JOIN Calendar C
ON C.dt >= P.price_start_date
AND C.dt < P.price_end_date
INNER JOIN CustomerBooking B
ON C.dt >= B.booking_start_date
AND C.dt < B.booking_end_date
GROUP
BY C.dt;
The same query as above, including sample data using CTEs:
WITH Prices (price_start_date, price_end_date, narrative, price)
AS
(
SELECT CAST(start_date AS Date), CAST(end_date AS Date), narrative, price
FROM (
VALUES ('2011-01-01T00:00:00', '2011-12-31T00:00:00', 'Normal price', 30),
('2011-01-01T00:00:00', '2011-02-21T00:00:00', 'Special Offer 1', 20),
('2011-02-19T00:00:00', '2011-02-24T00:00:00', 'Special Offer 2', 10)
) AS T (start_date, end_date, narrative, price)
),
CustomerBooking (booking_start_date, booking_end_date)
AS
(
SELECT CAST(start_date AS Date), CAST(end_date AS Date)
FROM (
VALUES ('2011-02-17T00:00:00', '2011-02-26T00:00:00')
) AS T (start_date, end_date)
)
SELECT C.dt, MIN(price) AS best_price
FROM Prices P
INNER JOIN Calendar C
ON C.dt >= P.price_start_date
AND C.dt < P.price_end_date
INNER JOIN CustomerBooking B
ON C.dt >= B.booking_start_date
AND C.dt < B.booking_end_date
GROUP
BY C.dt;
Let's supose that for each day you should apply lowest price.
create function price ( #fromDate date, #toDate date) returns money
as
begin
declare #iterator_day date
declare #total money
set #total = 0
set #iterator_day = #fromDate
WHILE #iterator_day < = #toDate
begin
select #total = #total + min( price )
from offers
where #iterator_day between offers.fromFay and offers.toDay
set #iterator_day = DATEADD (day , 1 , #iterator_day )
end
return #total
end
then you can call function in your query:
select
b.fromDay, b.toDay, dbo.price( b.fromDay, b.toDay )
from
booking b
I've only used ASP.net 4.0, but I can offer some SQL will give you the price for a given date:
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '18/2/11') AND (EndDate >= '18/2/11')
From your application you could build the query to be something like this:
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '17/2/11') AND (EndDate >= '17/2/11');
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '18/2/11') AND (EndDate >= '18/2/11');
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '19/2/11') AND (EndDate >= '19/2/11');
This would return a dataset of tables containing a single value for the minimum price for that date (in the same order as your query)
Sounds like a good job for a Stored Procedure...
Your problem here is that you're got multiple overlapping time periods. You either need to constrain the problem slightly, or remodel the data slightly. (To get desirable performance.)
Option 1 - Constraints
A data set of 'normal' prices - that never overlap with each other
A data set of 'special' prices - that also never overlap with each other
Every bookable date has a 'normal' price
Every bookable date has a 'special' price (EVEN if it's NULL to mean 'no special price')
The last constraint is the strangest one. But it's needed to make the simple join work. When comparing date ranges, it's alot easier to form the query if the two sets of ranges are gapless and have no overlaps inside them.
This means that you should now be able to work it out with just a few joins...
SELECT
CASE WHEN [sp].started > [np].started THEN [sp].started ELSE [np].started END AS [started]
CASE WHEN [sp].expired < [np].expired THEN [sp].expired ELSE [np].expired END AS [expired]
CASE WHEN [sp].price < [np].price THEN [sp].price ELSE [np].price END AS [price]
FROM
normal_prices AS [np]
LEFT JOIN
special_prices AS [sp]
ON [sp].started < [np].expired
AND [sp].expired > [np].started
AND [sp].started >= (SELECT ISNULL(MAX(started),0) FROM special_prices WHERE started <= [np].started)
-- The third condition is an optimisation for large data-sets.
WHERE
[np].started < #expired
AND [np].expired > #started
-- Note: Inclusive StartDates, Exlusive EndDate
-- For example, "all of Jan" would be "2011-01-01" to "2011-02-01"
Option 2 - Re-Model
This one is often the fastest in my experience; you increase the amount of space being used, and gain a simpler faster query...
Table Of Prices, stored by DAY rather than period...
- calendar_date
- price_code
- price
SELECT
calendar_date,
MIN(price)
FROM
prices
WHERE
calendar_date >= #started
AND calendar_date < #expired
Or, if you needed the price_code as well...
WITH
ordered_prices AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY calendar_date ORDER BY price ASC, price_code) AS price_rank,
*
FROM
prices
)
SELECT
calendar_date,
price_code,
price
FROM
ordered_prices
WHERE
calendar_date >= #started
AND calendar_date < #expired

Sum amount of overlapping datetime ranges in MySQL

I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.
I'm trying to output the sum of overlapping times and the number of events that overlapped.
What is the most efficient / simple way to perform this query in MySQL?
CREATE TABLE IF NOT EXISTS `events` (
`EventID` int(10) unsigned NOT NULL auto_increment,
`StartTime` datetime NOT NULL,
`EndTime` datetime default NULL,
PRIMARY KEY (`EventID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;
INSERT INTO `events` (`EventID`, `StartTime`, `EndTime`) VALUES
(10001, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003, '2009-02-09 07:00:00', '2009-02-09 09:00:00');
# if the query was run using the data above,
# the table below would be the desired output
# Number of Overlapped Events | Total Amount of Time those events overlapped.
1, 03:00:00
2, 02:00:00
3, 02:00:00
The purpose of these results is to generate a bill for hours used. (if you have one event running, you might pay 10 dollars per hour. But if two events are running, you only have to pay 8 dollars per hour, but only for the period of time you had two events running.)
Try this:
SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
SELECT
COUNT(*) AS `Count`,
UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
FROM (
SELECT #rownum1 := #rownum1 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum1 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times1
JOIN (
SELECT #rownum2 := #rownum2 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum2 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times2
ON Times1.rownum = Times2.rownum + 1
JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
GROUP BY Times1.rownum
) Totals
GROUP BY `Count`
Result:
1, 03:00:00
2, 02:00:00
3, 02:00:00
If this doesn't do what you want, or you want some explanation, please let me know. It could be made faster by storing the repeated subquery AllTimes in a temporary table, but hopefully it runs fast enough as it is.
Start with a table that contains a single datetime field as its primary key, and populate that table with every time value you're interested in. A leap years has 527040 minutes (31622400 seconds), so this table might get big if your events span several years.
Now join against this table doing something like
SELECT i.dt as instant, count(*) as events
FROM instant i JOIN event e ON i.dt BETWEEN e.start AND e.end
GROUP BY i.dt
WHERE i.dt BETWEEN ? AND ?
Having an index on instant.dt may let you forgo an ORDER BY.
If events are added infrequently, this may be something you want to precalculate by running the query offline, populating a separate table.
I would suggest an in-memory structure that has start-time,end-time,#events... (This is simplified as time(hours), but using unix time gives up to the second accuracy)
For every event, you would insert the new event as-is if there's no overlap, otherwise, find the overlap, and split the event to (up to 3) parts that may be overlapping, With your example data, starting from the first event:
Event 1 starts at 3am and ends at 10am: Just add the event since no overlaps:
3,10,1
Event 2 starts at 5am and ends at 9am: Overlaps,so split the original, and add the new one with extra "#events"
3,5,1
5,9,2
9,10,1
Event 3 starts at 7am and ends at 9am: also overlaps, do the same with all periods:
3,5,1
5,7,2
7,9,3
9,10,1
So calculating the overlap hours per #events:
1 event= (5-3)+(10-9)=3 hours
2 events = 7-5 = 2 hours
3 events = 9-7 = 2 hours
It would make sense to run this as a background process if there are many events to compare.

Resources