Impala - Working hours between two dates in impala - datetime

I have two time stamps #starttimestamp and #endtimestamp. How to calculate number of working hours between these two
Working hours is defined below:
Mon- Thursday (9:00-17:00)
Friday (9:00-13:00)
Have to work in impala

think i found a better solution.
we will create a series of numbers using a large table. You can get a time dimension type table too. Make it doenst get truncated. I am using a large table from my db.
Use this series to generate a date range between start and end date.
date_add (t.start_date,rs.uniqueid) -- create range of dates
join (select row_number() over ( order by mycol) as uniqueid -- create range of unique ids
from largetab) rs
where end_date >=date_add (t.start_date,rs.uniqueid)
Then we will calculate total hour difference between the timestamp using unix timestamp considering date and time.
unix_timestamp(endtimestamp - starttimestamp )
Exclude non working hours like 16hours on M-T, 20hours on F, 24hours on S-S.
case when dayofweek ( dday) in (1,7) then 24
when dayofweek ( dday) =5 then 20
else 16 end as non work hours
Here is complete SQL.
select
end_date, start_date,
diff_in_hr - sum(case when dayofweek ( dday) in (1,7) then 24
when dayofweek ( dday) =5 then 20
else 16 end ) total_workhrs
from (
select (unix_timestamp(end_date)- unix_timestamp(start_date))/3600 as diff_in_hr , end_date, start_date,date_add (t.start_date,rs.uniqueid) as dDay
from tdate t
join (select row_number() over ( order by mycol) as uniqueid from largetab) rs
where end_date >=date_add (t.start_date,rs.uniqueid)
)rs2
group by 1,2,diff_in_hr

Related

How to return the previous 7 days in Teradata

SELECT
DD.DATE_DATE AS Claim_Rcv_Date
from claim claim INNER JOIN DIM_DATE DD
ON DD.DATE_DIM_CK = CLAIM.CLAIM_RCVD_DATE_DIM_CK
WHERE ???
How would I limit to the previous 7 days? Using the DD.DATE_DATE AS Claim_Rcv_Date
Assuming the date column is actually stored as a proper date type, you can use:
WHERE your_date_column BETWEEN (CURRENT_DATE - INTERVAL '7' DAY) AND CURRENT_DATE

Carry Forward values in Presto

I am using the below query to pivot my data and generate a CSV but the problem is I have a dataset in which the data points are coming in a scattered way with each timestamp.
with map_date as (
SELECT
vin,
epoch,
timestamp,
date,
map_agg(signalName, value) as map_values
from hive.vehicle_signals.vehicle_signals_flat
where date(date) = date('2020-03-12')
and date(cast(from_unixtime(epoch) as timestamp) - interval '0' hour) = current_date - interval '2' day
and vin = '000011'
and signalName in ('timestamp','epoch','msgId','usec','vlan','vin','msgName','value')
GROUP BY vin, epoch, timestamp, date
order by timestamp desc
)
SELECT
epoch
, timestamp
, CASE WHEN element_at(map_values, 'value') IS NOT NULL THEN map_values['value'] ELSE NULL END AS value
, vin
, current_date - interval '2' day AS date
from map_date
I get the following CSV as a result. Is there a way I can carry forward the value until a new value is found at a newer timestamp? Like in the image below the value '14.3' comes and the next value '16.5' comes after a few timestamps, How can I carry the value '14.3' till row 7th and repeat the logic on the entire column. How can I make my output field look like column 'G' in the image using Presto?
Thanks in advance!!
You can use a mysql #variable to store the last value, for example:
SELECT
epoch
, timestamp
, CASE WHEN element_at(map_values, 'value') IS NOT NULL THEN #last_value:= map_values['value'] ELSE #last_value END AS value
, vin
, current_date - interval '2' day AS date
from map_date, (select #last_value:=0) v
The last part, (select #last_value:=0) v is to initialize the #last_value variable.
A basic tutorial
https://www.mysqltutorial.org/mysql-variables/
More advanced tutorial with additional info
https://www.xaprb.com/blog/2006/12/15/advanced-mysql-user-variable-techniques/

Working with date ranges (Classic ASP and SQL)

I have to implement a solution where two date ranges can overlap each other. within the overlapped dates, I have to count how many days overlap each other. Once I know the overlapped days I can calculate a total figure based on the price that's attached per day.
A scenario would be that
A customer is booking a hotel
Customer booking dates - 17/02/2011 to 26/02/2011
Normal price (All year) - 01/01/2011 - 31/12/2011 (price per day :$30.00)
Special Offer 1 dates - 01/01/2011 to 19/02/2011 (price per day :$20.00)
Special Offer 2 dates - 17/02/2011 to 24/02/2011 (price per day :$10.00)
In the above scenario, the proposed algorithm should work out the cheapest offer that the date ranges overlap and work out the price for the booking. If there is no special offer available it uses the normal price.
So for the first two days the system should get the price from "special offer 1" as it's the cheapest available price. Next 5 days should be "Special offer 2 price" and for the next 2 days it'll be normal price.
I'd be grateful to see both SQL(using MS-SQL Server) or Code base answers to get the diffrenet views.
I hope the question is clear and looking foward to see the answers.
Many thanks in advance
Using the standard trick of using an auxiliary calendar table, it is simply a case of joins and grouping to get the best price each day:
SELECT C.dt, MIN(price) AS best_price
FROM Prices P
INNER JOIN Calendar C
ON C.dt >= P.price_start_date
AND C.dt < P.price_end_date
INNER JOIN CustomerBooking B
ON C.dt >= B.booking_start_date
AND C.dt < B.booking_end_date
GROUP
BY C.dt;
The same query as above, including sample data using CTEs:
WITH Prices (price_start_date, price_end_date, narrative, price)
AS
(
SELECT CAST(start_date AS Date), CAST(end_date AS Date), narrative, price
FROM (
VALUES ('2011-01-01T00:00:00', '2011-12-31T00:00:00', 'Normal price', 30),
('2011-01-01T00:00:00', '2011-02-21T00:00:00', 'Special Offer 1', 20),
('2011-02-19T00:00:00', '2011-02-24T00:00:00', 'Special Offer 2', 10)
) AS T (start_date, end_date, narrative, price)
),
CustomerBooking (booking_start_date, booking_end_date)
AS
(
SELECT CAST(start_date AS Date), CAST(end_date AS Date)
FROM (
VALUES ('2011-02-17T00:00:00', '2011-02-26T00:00:00')
) AS T (start_date, end_date)
)
SELECT C.dt, MIN(price) AS best_price
FROM Prices P
INNER JOIN Calendar C
ON C.dt >= P.price_start_date
AND C.dt < P.price_end_date
INNER JOIN CustomerBooking B
ON C.dt >= B.booking_start_date
AND C.dt < B.booking_end_date
GROUP
BY C.dt;
Let's supose that for each day you should apply lowest price.
create function price ( #fromDate date, #toDate date) returns money
as
begin
declare #iterator_day date
declare #total money
set #total = 0
set #iterator_day = #fromDate
WHILE #iterator_day < = #toDate
begin
select #total = #total + min( price )
from offers
where #iterator_day between offers.fromFay and offers.toDay
set #iterator_day = DATEADD (day , 1 , #iterator_day )
end
return #total
end
then you can call function in your query:
select
b.fromDay, b.toDay, dbo.price( b.fromDay, b.toDay )
from
booking b
I've only used ASP.net 4.0, but I can offer some SQL will give you the price for a given date:
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '18/2/11') AND (EndDate >= '18/2/11')
From your application you could build the query to be something like this:
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '17/2/11') AND (EndDate >= '17/2/11');
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '18/2/11') AND (EndDate >= '18/2/11');
SELECT ISNULL(MIN(PricePerDay), 0) AS MinPricePerDay
FROM Offers
WHERE (StartDate <= '19/2/11') AND (EndDate >= '19/2/11');
This would return a dataset of tables containing a single value for the minimum price for that date (in the same order as your query)
Sounds like a good job for a Stored Procedure...
Your problem here is that you're got multiple overlapping time periods. You either need to constrain the problem slightly, or remodel the data slightly. (To get desirable performance.)
Option 1 - Constraints
A data set of 'normal' prices - that never overlap with each other
A data set of 'special' prices - that also never overlap with each other
Every bookable date has a 'normal' price
Every bookable date has a 'special' price (EVEN if it's NULL to mean 'no special price')
The last constraint is the strangest one. But it's needed to make the simple join work. When comparing date ranges, it's alot easier to form the query if the two sets of ranges are gapless and have no overlaps inside them.
This means that you should now be able to work it out with just a few joins...
SELECT
CASE WHEN [sp].started > [np].started THEN [sp].started ELSE [np].started END AS [started]
CASE WHEN [sp].expired < [np].expired THEN [sp].expired ELSE [np].expired END AS [expired]
CASE WHEN [sp].price < [np].price THEN [sp].price ELSE [np].price END AS [price]
FROM
normal_prices AS [np]
LEFT JOIN
special_prices AS [sp]
ON [sp].started < [np].expired
AND [sp].expired > [np].started
AND [sp].started >= (SELECT ISNULL(MAX(started),0) FROM special_prices WHERE started <= [np].started)
-- The third condition is an optimisation for large data-sets.
WHERE
[np].started < #expired
AND [np].expired > #started
-- Note: Inclusive StartDates, Exlusive EndDate
-- For example, "all of Jan" would be "2011-01-01" to "2011-02-01"
Option 2 - Re-Model
This one is often the fastest in my experience; you increase the amount of space being used, and gain a simpler faster query...
Table Of Prices, stored by DAY rather than period...
- calendar_date
- price_code
- price
SELECT
calendar_date,
MIN(price)
FROM
prices
WHERE
calendar_date >= #started
AND calendar_date < #expired
Or, if you needed the price_code as well...
WITH
ordered_prices AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY calendar_date ORDER BY price ASC, price_code) AS price_rank,
*
FROM
prices
)
SELECT
calendar_date,
price_code,
price
FROM
ordered_prices
WHERE
calendar_date >= #started
AND calendar_date < #expired

MySQL: Records inserted by hour, for the last 24 hours

I'm trying to list the number of records per hour inserted into a database for the last 24 hours. Each row displays the records inserted that hour, as well as how many hours ago it was.
Here's my query now:
SELECT COUNT(*), FLOOR( TIME_TO_SEC( TIMEDIFF( NOW(), time)) / 3600 )
FROM `records`
WHERE time > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY HOUR(time)
ORDER BY time ASC
right now it returns:
28 23
62 23
14 20
1 4
28 3
19 1
That shows two rows from 23 hours ago, when it should only show one per hour.
I think it has something to do with using NOW() instead of getting the time at the start of the hour, which I'm unsure on how to get.
There must be a simpler way of doing this.
If you grouped by HOUR(time) then you should use HOUR(time) in your select expressions, and not time. For example:
SELECT HOUR(time), COUNT(*)
FROM `records`
WHERE time > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY HOUR(time)
ORDER BY HOUR(time)
Alternatively you can group by the expression you want to return:
SELECT COUNT(*), FLOOR( TIME_TO_SEC( TIMEDIFF( NOW(), time)) / 3600 )
FROM `records`
WHERE time > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY FLOOR( TIME_TO_SEC( TIMEDIFF( NOW(), time)) / 3600 )
ORDER BY FLOOR( TIME_TO_SEC( TIMEDIFF( NOW(), time)) / 3600 )
In case you were wondering, it is safe to call NOW() multiple times in the same query like this. From the manual:
Functions that return the current date or time each are evaluated only once per query at the start of query execution. This means that multiple references to a function such as NOW() within a single query always produce the same result.

Sum amount of overlapping datetime ranges in MySQL

I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.
I'm trying to output the sum of overlapping times and the number of events that overlapped.
What is the most efficient / simple way to perform this query in MySQL?
CREATE TABLE IF NOT EXISTS `events` (
`EventID` int(10) unsigned NOT NULL auto_increment,
`StartTime` datetime NOT NULL,
`EndTime` datetime default NULL,
PRIMARY KEY (`EventID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;
INSERT INTO `events` (`EventID`, `StartTime`, `EndTime`) VALUES
(10001, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003, '2009-02-09 07:00:00', '2009-02-09 09:00:00');
# if the query was run using the data above,
# the table below would be the desired output
# Number of Overlapped Events | Total Amount of Time those events overlapped.
1, 03:00:00
2, 02:00:00
3, 02:00:00
The purpose of these results is to generate a bill for hours used. (if you have one event running, you might pay 10 dollars per hour. But if two events are running, you only have to pay 8 dollars per hour, but only for the period of time you had two events running.)
Try this:
SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
SELECT
COUNT(*) AS `Count`,
UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
FROM (
SELECT #rownum1 := #rownum1 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum1 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times1
JOIN (
SELECT #rownum2 := #rownum2 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum2 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times2
ON Times1.rownum = Times2.rownum + 1
JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
GROUP BY Times1.rownum
) Totals
GROUP BY `Count`
Result:
1, 03:00:00
2, 02:00:00
3, 02:00:00
If this doesn't do what you want, or you want some explanation, please let me know. It could be made faster by storing the repeated subquery AllTimes in a temporary table, but hopefully it runs fast enough as it is.
Start with a table that contains a single datetime field as its primary key, and populate that table with every time value you're interested in. A leap years has 527040 minutes (31622400 seconds), so this table might get big if your events span several years.
Now join against this table doing something like
SELECT i.dt as instant, count(*) as events
FROM instant i JOIN event e ON i.dt BETWEEN e.start AND e.end
GROUP BY i.dt
WHERE i.dt BETWEEN ? AND ?
Having an index on instant.dt may let you forgo an ORDER BY.
If events are added infrequently, this may be something you want to precalculate by running the query offline, populating a separate table.
I would suggest an in-memory structure that has start-time,end-time,#events... (This is simplified as time(hours), but using unix time gives up to the second accuracy)
For every event, you would insert the new event as-is if there's no overlap, otherwise, find the overlap, and split the event to (up to 3) parts that may be overlapping, With your example data, starting from the first event:
Event 1 starts at 3am and ends at 10am: Just add the event since no overlaps:
3,10,1
Event 2 starts at 5am and ends at 9am: Overlaps,so split the original, and add the new one with extra "#events"
3,5,1
5,9,2
9,10,1
Event 3 starts at 7am and ends at 9am: also overlaps, do the same with all periods:
3,5,1
5,7,2
7,9,3
9,10,1
So calculating the overlap hours per #events:
1 event= (5-3)+(10-9)=3 hours
2 events = 7-5 = 2 hours
3 events = 9-7 = 2 hours
It would make sense to run this as a background process if there are many events to compare.

Resources