Finding minimum in teradata if values can be null also - case

I want to find minimum of six values in teradata. Following is the logic I am using but it fails when there is a null value in either of the fields. I would appreciate your help on this. Thanks in advance.
CASE WHEN event.A1 <= event.A2 and event.A1 <= event.A3 and event.A1 <= event.A4 and event.A1
<= event.A5 and event.A1 <= event.A6 THEN event.A1
WHEN event.A2 <= event.A3 and event.A2 <= event.A4 and event.A2 <= event.A
WAITING_PICKUP_NOTELEFT and event.A2 <= event.A6 THEN event.A2
WHEN event.A3 <= event.A4 and event.A3 <= event.A5 and event.EXC_DELI
VERY_ATTEMPT <= event.A6 THEN event.A3
WHEN event.A4 <= event.A5 and event.A4 <= event.A6 THEN event.EXC_DELIVER
Y_ATT_NOTE
WHEN event.A5 <= event.A6 THEN event.A5
ELSE coalesce(event.A6,event.A5,event.A4,event.A3,event.A2,ev
ent.A1) END AS ZZZ

I think I'd use a derived table to handle all the null logic one time, and then select from that. Something like:
SELECT
CASE WHEN t1.a1 <= t1.a2 and t1.a1 <= t1.a3...
FROM
(Select
coalesce(event.A1,0), --don't know what the data type is, replace accordingly
coalesce(event.A2,0),
...) t1
It's still kind of ugly, but I think it will still be simpler.
EDIT:
I may be oversimplifying, but if you're on TD 14, you could look into the LEAST function:
SELECT LEAST(event.a1,event.a2,event.a3,...)

Related

Creating seasonal subset

Having a little trouble with creating seasonal subsets in r.
The datetime is already in the POSIXct format so I didn't think it necessary to add the as.POSIXct() function.
Also, the dataset is already organized by datetime.
This is what the current code looks like.
summer_subset <- subset(YTD_v5, YTD_v5$started_at >= '2021-06-21 00:00:00' & YTD_v5$ended_at <= '2021-09-21 23:59:59')
fall_subset <- subset(YTD_v5, YTD_v5$started_at >= '2021-09-22 00:00:00' & YTD_v5$ended_at <= '2021-12-20 23:59:59')
winter_subset <- subset(YTD_v5, (YTD_v5$started_at >= '2021-12-21 00:00:00' & YTD_v5$ended_at <= '2022-02-28 23:59:59') | (YTD_v5$started_at >= '2021-03-01 00:00:00' & YTD_v5$ended_at <= '2021-03-19 23:59:59'))
spring_subset <- subset(YTD_v5, YTD_v5$started_at >= '2021-03-20 00:00:00' & YTD_v5$ended_at <= '2021-06-20 23:59:59')
When I view the summer_subset, the rows start at 2021-06-21 04:00:00, not 00:00:00. The final entry is 2021-09-21 03:55:00, not 23:59:59.
In the YTD_v5 dataset, there are entries that contain start times at 00:00:00 and end times that end at 23:59:59.
Thanks for any insight in advanced.

SQLITE results inconsistency

I am summarising the outputs of a survey, stored in a sqlite database file, and have a view defined as follows - this is meant to show entries in the valid response view where the respondent has indicated that EITHER:
(a) they are meeting the requirements already; OR,
(b) they aren't meeting all requirements, but the associated actions in place will be complete by the end of the year (31/12/2020):
CREATE VIEW complete_dec20 AS
SELECT *
FROM valid_response
WHERE
(impact_answer NOT IN ("Fully","Yes","N/A") AND
td__update_leg_doc <= "2020-12-31" AND
td__update_proc <= "2020-12-31" AND
td__update_op_proc <= "2020-12-31" AND
td__update_tech <= "2020-12-31" AND
td__training <= "2020-12-31") OR
impact_answer IN ("Fully","Yes","N/A")
The records included in the view are correct, however, when I query the results from the valid_response view that are not included in the view, there are some strange results:
SELECT *
FROM valid_response
WHERE id NOT IN (SELECT id FROM complete_dec20);
e.g.
id,impact_answer,td__update_leg_doc,td__update_proc,td__update_op_proc,td__update_tech,td__training
7,Partially,2020-12-31,,,,
Based on the date of 2020-12-31 and answer of 'Partially', this should be in the complete_dec20 view.
Can you explain why it isn't / what I'm missing?
Based on the date of 2020-12-31 and answer of 'Partially', this should
be in the complete_dec20 view
This should be in the complete_dec20 view only if all of these conditions are true:
td__update_leg_doc <= '2020-12-31' AND
td__update_proc <= '2020-12-31' AND
td__update_op_proc <= '2020-12-31' AND
td__update_tech <= '2020-12-31' AND
td__training <= '2020-12-31'
Are they?
I don't think so.
If they were true then the id would be returned by complete_dec20.
Also, the WHERE clause of complete_dec20 can be a bit simpler because there is no need to check impact_answer NOT IN ('Fully','Yes','N/A'):
CREATE VIEW complete_dec20 AS
SELECT *
FROM valid_response
WHERE impact_answer IN ('Fully','Yes','N/A')
OR
(
td__update_leg_doc <= '2020-12-31' AND
td__update_proc <= '2020-12-31' AND
td__update_op_proc <= '2020-12-31' AND
td__update_tech <= '2020-12-31' AND
td__training <= '2020-12-31'
)
Or even simpler with the function MAX():
CREATE VIEW complete_dec20 AS
SELECT *
FROM valid_response
WHERE impact_answer IN ('Fully','Yes','N/A')
OR
MAX(
td__update_leg_doc,
td__update_proc,
td__update_op_proc,
td__update_tech,
td__training
) <= '2020-12-31'

SQL - Using logical operators in a UDF case statement

So it took a while for me to figure out how to create my first UDF but after I fixed it, I figured my next one would be a piece of cake. Unfortunately, it hasn't been the case. I'm pulling a field (ORIG_CLAIM, float) and I want to categorize that number. Here's my code:
CREATE FUNCTION [dbo].[fnOC_LEVEL](#ORIG_CLAIM float)
RETURNS nvarchar(255)
AS
BEGIN
DECLARE #result as varchar(255);
SELECT #result = case #ORIG_CLAIM
when < 1000 then 'A_Under 1000'
when >= 1000 and <= 4999.99 then 'B_1000-4999'
when >= 5000 and <= 7499.99 then 'C_5000-7499'
when >= 7500 and <= 9999.99 then 'D_7500-9999'
when >= 10000 and <= 14999.99 then 'E_10000-14999'
when >= 15000 and <= 19999.99 then 'F_15000-19999'
when >= 20000 then 'G_Over 20000'
END
RETURN #result
END
GO
I'm getting the error "Incorrect syntax near '<'". Can anyone wee what I might be doing wrong?
I think you may have to specify the comparison values as float. For example:
when < 1.0E3 then 'A_Under 1000'
when >= 1.0E3 and <= 4.99999E3 then 'B_1000-4999'
etc.

What is the equivalent to a SELECT query involving two tables (dataframes) in R?

My sqldf set up in R uses SQLite by default. I tried the following query without success:
query = "UPDATE t1
SET Actual = t2.AvgRevenue,
Total = t2.AvgRevenue
WHERE Name=t2.Name AND
Pillar= 'HW' AND
(Status <> 'Lost') AND
Revenue=0"
t1 = sqldf(c(query,"select * from pl0"))
t1 has columns Name, Pillar, Status, Revenue, Actual, Total
t2 is a lookup table with columns Name, AvgRevenue
After doing some research, I found that SQLite does not currently support UPDATE queries involving two or more tables.
My question is this: can I do the equivalent of the query above using only R?
To get an answer, I tried the following:
test <- t1[t1$Revenue == 0 & t1$Status == 'Lost' & t1$Pillar == 'HW',]
test$Actual <- test$Name
mapvalues(test$Actual,
t2$Name,
t2$AvgRevenue,
warn_missing = FALSE)
t1 <- test
but mapvalues is not updating column test$Actual as I expected. The right values of t2$AvgRevenue are output to the console, but test$Actual is not updated. By the way, I want t1 to be the same data frame as before, but with the appropriate rows in columns Actual and Total updated.
Any suggestions will be greatly appreciated!
You can use the dplyr library to select the variables:
library(dplyr)
Actual <- select(t1, Name, Pillar, Status, Revenue)
Avg_Revenue < select(t2, Name, AvgRevenue)
complete_data = cbind(Actual, Avg_Revenue)
You can also use the filter:
filter(Actual, Revenue==0, Status =="lost")
Hope it helps
I found an answer to my question, based on R. Here it is:
t1 <- data.frame(Name=c("A","B","C","D"),
Pillar=c("SW","HW","HW","SW"),
Status=c("Won","Open","Won","Lost"),
Revenue=c(5,0,0,0),
Actual=c(5,0,0,0),
Total=c(5,0,0,0))
t2 <- data.frame(Name=c("A","B","C","D"),
AvgRevenue=c(5,3,7,10))
t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',]$Actual <-
as.character(t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',]$Name)
t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',"Actual"] <-
mapvalues(t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',"Actual"],
t2$Name,
t2$AvgRevenue,
warn_missing = FALSE)
t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',"Total"] <-
as.character(t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',"Name"])
t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',"Total"] <-
mapvalues(t1[t1$Revenue == 0 & t1$Status != 'Lost' & t1$Pillar == 'HW',"Total"],
t2$Name,
t2$AvgRevenue,
warn_missing = FALSE)
t1
The trick is to use the common key between t1 & t2 (Name) as an intermediate step to be able to use mapvalues to do the final step. This is the equivalent of the original SQL UPDATE query. Thank you very much for all your suggestions!

Can't get PL SQL to divide with two select statements

I need help getting this code to work. I am trying to use two sql statements to perform a division.
select ((select count(p.issued)
from permit p
where trunc(p.issued) >= trunc(TO_DATE('1/1/2011','MM/DD/YYYY'))
AND trunc(p.issued) <= trunc(TO_DATE('1/31/2011','MM/DD/YYYY')))
/
(select count(p.issued)
from permit p
where (TO_DATE(p.issued) - sysdate) <= 21
and trunc(p.issued) >= trunc(TO_DATE('1/1/2011','MM/DD/YYYY'))
AND trunc(p.issued) <= trunc(TO_DATE('1/31/2011','MM/DD/YYYY')))) as permitPercemt;
You need to add one more SELECT :
SELECT ( (select ....)/(select ... )) as permitPercent FROM DUAL;
UPDATE
So your query will look like:
SELECT (
(
select count(p.issued)
from permit p
where trunc(p.issued) >= trunc(TO_DATE('1/1/2011','MM/DD/YYYY'))
AND trunc(p.issued) <= trunc(TO_DATE('1/31/2011','MM/DD/YYYY'))
) /
(
select count(p.issued)
from permit p
where (TO_DATE(p.issued) - sysdate) <= 21
and trunc(p.issued) >= trunc(TO_DATE('1/1/2011','MM/DD/YYYY'))
AND trunc(p.issued) <= trunc(TO_DATE('1/31/2011','MM/DD/YYYY'))
)
) as permitPercent FROM DUAL;
you don't divide the entire select statement.
you divide two values and return that as a column in the select list.

Resources