Carry Forward values in Presto - window-functions

I am using the below query to pivot my data and generate a CSV but the problem is I have a dataset in which the data points are coming in a scattered way with each timestamp.
with map_date as (
SELECT
vin,
epoch,
timestamp,
date,
map_agg(signalName, value) as map_values
from hive.vehicle_signals.vehicle_signals_flat
where date(date) = date('2020-03-12')
and date(cast(from_unixtime(epoch) as timestamp) - interval '0' hour) = current_date - interval '2' day
and vin = '000011'
and signalName in ('timestamp','epoch','msgId','usec','vlan','vin','msgName','value')
GROUP BY vin, epoch, timestamp, date
order by timestamp desc
)
SELECT
epoch
, timestamp
, CASE WHEN element_at(map_values, 'value') IS NOT NULL THEN map_values['value'] ELSE NULL END AS value
, vin
, current_date - interval '2' day AS date
from map_date
I get the following CSV as a result. Is there a way I can carry forward the value until a new value is found at a newer timestamp? Like in the image below the value '14.3' comes and the next value '16.5' comes after a few timestamps, How can I carry the value '14.3' till row 7th and repeat the logic on the entire column. How can I make my output field look like column 'G' in the image using Presto?
Thanks in advance!!

You can use a mysql #variable to store the last value, for example:
SELECT
epoch
, timestamp
, CASE WHEN element_at(map_values, 'value') IS NOT NULL THEN #last_value:= map_values['value'] ELSE #last_value END AS value
, vin
, current_date - interval '2' day AS date
from map_date, (select #last_value:=0) v
The last part, (select #last_value:=0) v is to initialize the #last_value variable.
A basic tutorial
https://www.mysqltutorial.org/mysql-variables/
More advanced tutorial with additional info
https://www.xaprb.com/blog/2006/12/15/advanced-mysql-user-variable-techniques/

Related

Impala - Working hours between two dates in impala

I have two time stamps #starttimestamp and #endtimestamp. How to calculate number of working hours between these two
Working hours is defined below:
Mon- Thursday (9:00-17:00)
Friday (9:00-13:00)
Have to work in impala
think i found a better solution.
we will create a series of numbers using a large table. You can get a time dimension type table too. Make it doenst get truncated. I am using a large table from my db.
Use this series to generate a date range between start and end date.
date_add (t.start_date,rs.uniqueid) -- create range of dates
join (select row_number() over ( order by mycol) as uniqueid -- create range of unique ids
from largetab) rs
where end_date >=date_add (t.start_date,rs.uniqueid)
Then we will calculate total hour difference between the timestamp using unix timestamp considering date and time.
unix_timestamp(endtimestamp - starttimestamp )
Exclude non working hours like 16hours on M-T, 20hours on F, 24hours on S-S.
case when dayofweek ( dday) in (1,7) then 24
when dayofweek ( dday) =5 then 20
else 16 end as non work hours
Here is complete SQL.
select
end_date, start_date,
diff_in_hr - sum(case when dayofweek ( dday) in (1,7) then 24
when dayofweek ( dday) =5 then 20
else 16 end ) total_workhrs
from (
select (unix_timestamp(end_date)- unix_timestamp(start_date))/3600 as diff_in_hr , end_date, start_date,date_add (t.start_date,rs.uniqueid) as dDay
from tdate t
join (select row_number() over ( order by mycol) as uniqueid from largetab) rs
where end_date >=date_add (t.start_date,rs.uniqueid)
)rs2
group by 1,2,diff_in_hr

How to return the previous 7 days in Teradata

SELECT
DD.DATE_DATE AS Claim_Rcv_Date
from claim claim INNER JOIN DIM_DATE DD
ON DD.DATE_DIM_CK = CLAIM.CLAIM_RCVD_DATE_DIM_CK
WHERE ???
How would I limit to the previous 7 days? Using the DD.DATE_DATE AS Claim_Rcv_Date
Assuming the date column is actually stored as a proper date type, you can use:
WHERE your_date_column BETWEEN (CURRENT_DATE - INTERVAL '7' DAY) AND CURRENT_DATE

SQL: Sum based on specified dates

Thanks again for the help everyone. I went with the script below...
SELECT beginning, end,
(SELECT SUM(sale) FROM sales_log WHERE date BETWEEN beginning AND `end` ) AS sales
FROM performance
and I added a salesperson column to both the performance table and sales_log but it winds up crashing DB Browser. What is the issue here? New code below:
SELECT beginning, end, salesperson
(SELECT SUM(sale) FROM sales_log WHERE (date BETWEEN beginning AND end) AND sales_log.salesperson = performance.salesperson ) AS sales
FROM performance
I believe that the following may do what you wish or be the basis for what you wish.
WITH sales_log_cte AS
(
SELECT substr(date,(length(date) -3),4)||'-'||
CASE WHEN length(replace(substr(date,instr(date,'/')+1,2),'/','')) < 2 THEN '0' ELSE '' END
||replace(substr(date,instr(date,'/')+1,2),'/','')||'-'||
CASE WHEN length(substr(date,1,instr(date,'/') -1)) < 2 THEN '0' ELSE '' END||substr(date,1,instr(date,'/') -1) AS date,
CAST(sale AS REAL) AS sale
FROM sales_log
),
performance_cte AS
(
SELECT substr(beginning,(length(beginning) -3),4)||'-'||
CASE WHEN length(replace(substr(beginning,instr(beginning,'/')+1,2),'/','')) < 2 THEN '0' ELSE '' END
||replace(substr(beginning,instr(beginning,'/')+1,2),'/','')||'-'||
CASE WHEN length(substr(beginning,1,instr(beginning,'/') -1)) < 2 THEN '0' ELSE '' END||substr(beginning,1,instr(beginning,'/') -1)
AS beginning,
substr(`end`,(length(`end`) -3),4)||'-'||
CASE WHEN length(replace(substr(`end`,instr(`end`,'/')+1,2),'/','')) < 2 THEN '0' ELSE '' END
||replace(substr(`end`,instr(`end`,'/')+1,2),'/','')||'-'||
CASE WHEN length(substr(`end`,1,instr(`end`,'/') -1)) < 2 THEN '0' ELSE '' END||substr(`end`,1,instr(`end`,'/') -1)
AS `end`
FROM performance
)
SELECT beginning, `end` , (SELECT SUM(sale) FROM sales_log_cte WHERE date BETWEEN beginning AND `end` ) AS sales
FROM performance_cte
;
From your data this results in :-
As can be seen the bulk of the code is converting the dates into a format (i.e. YYYY-MM-DD) that is usable/recognisable by SQLite for the BETWEEN clause.
Date And Time Functions
I don't believe that you want a join between performance (preformance_cte after reformatting the dates) and sales_log (sales_log_cte) as this will be a cartesian product and then sum will sum all the results within the range.
The use of end as a column name is also awkward as it is a KEYWORD requiring it to be enclosed (` grave accents used in the above).
The above works by using 2 CTE's (Common Table Expresssions), which are temporary tables who'd life time is for the query in which they are used.
The first sales_log_cte is simply the sales_log table but with the date reformatted. The second, likewise, is simply the performace table with the dates reformatted.
If the tables already has suitable date formatting then all of the above could simply be :-
SELECT beginning, `end` , (SELECT SUM(sale) FROM sales_log WHERE date BETWEEN beginning AND `end` ) AS sales FROM performance;

Redshift Error when adding date to a time stamp using case statement

I am trying to have a CASE statement that adds days to a time stamp column.
select cust_id,
case when type = 'a' then (created_date - INTERVAL '7 DAY')
when type = 'b' then (created_date - INTERVAL '10 DAY')
else 0 end as date_when_breach
from table
The above throws an error
Reason:
SQL Error [42804]: ERROR: CASE types integer and timestamp without time zone cannot be matched
Sample created_date value is 2019-02-14 11:16:16
Your CASE statement is not consistent with return types - first two branches return a DATE and the ELSE returns an INTEGER. Change your ELSE to return DATE (eg.current_date, depends on what you want to achieve) or NULL (or just remove it, which will have the same effect).
select
cust_id,
case
when type = 'a' then (created_date - INTERVAL '7 DAY')
when type = 'b' then (created_date - INTERVAL '10 DAY')
else NULL
end as date_when_breach
from table

Not able to filter records based on date filter in Informix

I want to put filter on an Informix query:
WHERE agentstatedetail.eventdatetime < '1753-01-01 00:00:00' - INTERVAL(3) DAY TO DAY
but it fails ...
Please tell where it goes wrong.
As noted in a comment, the solution is to ensure that the string is interpreted as a DATETIME value. The simple way to do that is to use the DATETIME literal notation:
DATETIME(1753-01-01 00:00:00) YEAR TO SECOND
To demonstrate:
CREATE TABLE agentstatedetail
(
eventdatetime DATETIME YEAR TO SECOND NOT NULL PRIMARY KEY,
eventname VARCHAR(64) NOT NULL
);
INSERT INTO agentstatedetail VALUES('1752-12-25 12:00:00', 'Christmas Day, Noon, 1752');
INSERT INTO agentstatedetail VALUES('1752-12-31 12:00:00', 'New Year''s Eve, Noon, 1752');
INSERT INTO agentstatedetail VALUES('1753-01-01 12:00:00', 'New Year''s Day, Noon, 1753');
SELECT * FROM agentstatedetail WHERE agentstatedetail.eventdatetime < '1753-01-01 00:00:00' - INTERVAL(3) DAY TO DAY;
This is your original WHERE clause embedded into a minimal SELECT statement. It yields the error:
SQL -1261: Too many digits in the first field of datetime or interval.
(NB: It would have been helpful to include the error message in the question.)
Here's an alternative version of the query, with the DATETIME literal in place:
SELECT * FROM agentstatedetail
WHERE agentstatedetail.eventdatetime < DATETIME(1753-01-01 00:00:00) YEAR TO SECOND -
INTERVAL(3) DAY TO DAY
;
Output from the sample data:
1752-12-25 12:00:00|Christmas DAY, Noon, 1752
I observe that the value calculated is a constant; you could rewrite the code as:
SELECT * FROM agentstatedetail
WHERE agentstatedetail.eventdatetime < DATETIME(1752-12-29 00:00:00) YEAR TO SECOND
I suspect that the value is passed as a parameter somewhere along the line.
Alternatively, you can cast the string to a DATETIME value and you'd get the same result:
SELECT * FROM agentstatedetail
WHERE agentstatedetail.eventdatetime < CAST('1753-01-01 00:00:00' AS DATETIME YEAR TO SECOND) -
INTERVAL(3) DAY TO DAY
;
or:
SELECT * FROM agentstatedetail
WHERE agentstatedetail.eventdatetime < '1753-01-01 00:00:00'::DATETIME YEAR TO SECOND -
INTERVAL(3) DAY TO DAY

Resources