How to write this ifelse statement in R correctly - r

I want to return a value in a column, or NA, contingent on values in other columns.
I basically want to see if the value in the column meets the first test criteria:
df$v2.1 >= df$varx & df$v3.1 <6
if not does it meet the second:
df$v4.1 >= df$vary & df$v5.1 >5
and then if neither return NA
The code I have tried is below.
df$v1.1 = ifelse(df$v2.1 >= df$varx & df$v3.1 <6 || df$v4.1 >= df$vary & df$v5.1 >5 ,df$v1.1, NA)

Your only mistake is using || rather than |. || is not vectorised, and only considers the first element. All your other operators (and ifelse()) are vectorised, so the following should work as expected:
df$v1.1 = ifelse(df$v2.1 >= df$varx & df$v3.1 <6 | df$v4.1 >= df$vary & df$v5.1 > 5, df$v1.1, NA)
A good way to check when you're doing reasonably complex or multiple logical operations is to run each one of them and see if you're getting the expected output. If you run:
df$v2.1 >= df$varx & df$v3.1 <6
or
df$v4.1 >= df$vary & df$v5.1 > 5
you should get a vector of logical values. If you run:
df$v2.1 >= df$varx & df$v3.1 <6 || df$v4.1 >= df$vary & df$v5.1 > 5
you should get a single logical value. In your case, that will give a single result from the ifelse(), which then gets recycled to fill df$v1.1.

From what I can tell df$v1.1 is already defined, so you only need to modify those rows that fail the test in your ifelse. The following might be easier:
df$v1.1[
which(
!(df$v2.1 >= df$varx & df$v3.1 <6) & !(df$v4.1 >= df$vary & df$v5.1 >5))
] <- NA

Related

Create new variable in R with assumptions from SPSS file

I've read in my SPSS file in R and want to recode a new variable if such and such assumptions are made. To be specific:
I want to turn my spssdata_sub$gest variable into a new variable if the following the conditions are met:
spssdata_sub$indusert != 2 & spssdata_sub$ivf != 1 & spssdata_sub$leie != 3 & spssdata_sub$svkompl_II != 7 & spssdata_sub$svkompl_II != 2 & spssdata_sub$svkompl_II != 1
Anyone here who can help me with a code?
Does one of the following codes work for you?
Either this adapted version of Renu's solution
spssdata_sub$gest <- ifelse(spssdata_sub$indusert != 2 & spssdata_sub$ivf != 1 & spssdata_sub$leie != 3 & spssdata_sub$svkompl_II != 7 & spssdata_sub$svkompl_II != 2 & spssdata_sub$svkompl_II != 1, spssdata_sub$gest, NA)
or this code for filtering observations:
library(dplyr)
spssdata_sub_new <- spssdata_sub %>%
filter(indusert != 2 & ivf != 1 & leie != 3 & svkompl_II != 7 & svkompl_II != 2 & ssvkompl_II != 1)
One way is the following, if you really mean either one of the conditions
Mynewdata <- dplyr::filter(spssdata, indusert != 2, ivf != 1, leie != 3,
svkompl_II != 7 & svkompl_II != 2 & svkompl_II != 1)
only keeps entries that are neither, or putting it the other way exludes entries that have either indusert = 2 or ivf = 1 etc... one of the condition is enough to exclude it.
add-on: or something also like that:
Mynewdata <- dplyr::filter(spssdata, indusert != 2, ivf != 1, leie != 3,
!(svkompl_II %in% c(7,2,1))

Exporting Summary Data to CSV in R

Hello everyone I am working on a script that I would like to export to a CSV file.
Everything is working well with the exception that I would like to add column names and headers for the below data.
For instance variable A is the summary data of fixed income trades in 2017. I would like Row 1 in the output file to read as such.
Any help would be greatly appreciated. My code is written below. Thanks in advance!!
#SENDS THE RESULTS TO FILE CALLED OUTFILE.TXT WHICH IS OVERWRITTEN EACH TIME SCRIPT IS RUN
sink("outfile.csv")
#SHORT-TERM PRE-REFUNDED TRADE DATA
A = MSRB[which(MSRB$Coupon.Rate >= 2 & MSRB$Year == 2017 & MSRB$Par.Traded >=500 & MSRB$Class == "PRE-REFUNDED"),]
B = MSRB[which(MSRB$Coupon.Rate >= 2 & MSRB$Year == 2017 & MSRB$Par.Traded >=1000 & MSRB$Class == "PRE-REFUNDED"),]
C = MSRB[which(MSRB$Coupon.Rate >= 2 & MSRB$Year == 2018 & MSRB$Par.Traded >=500 & MSRB$Class == "PRE-REFUNDED"),]
D = MSRB[which(MSRB$Coupon.Rate >= 2 & MSRB$Year == 2018 & MSRB$Par.Traded >=1000 & MSRB$Class == "PRE-REFUNDED"),]
E = MSRB[which(MSRB$Coupon.Rate >= 2 & MSRB$Year == 2019 & MSRB$Par.Traded >=500 & MSRB$Class == "PRE-REFUNDED"),]
F = MSRB[which(MSRB$Coupon.Rate >= 2 & MSRB$Year == 2019 & MSRB$Par.Traded >=1000 & MSRB$Class == "PRE-REFUNDED"),]
#SUMMARY OF PRE-REFUNDED DATA
summary(A$Yield)
summary(B$Yield)
summary(C$Yield)
summary(D$Yield)
summary(E$Yield)
summary(F$Yield)
#END OF OUTPUT FILE
sink()

R: recommendation on how to compute new columns on multiple condition of others for every row in data.frame

For every entry in rows i need to compute two variables as new columns in a data.frame depending conditional on more than 60 other columns. I would like your recommendation on how to realize that elegant (while and for, with, ifelse, foreach, by or ddply?). I don't like to do that manually like i did for the first cases in the example code and i don't care for performance.
Further: Probably i would not need to ask if i would have understood how to use functions like transform (with ddply or by) and what they do. Thus i hope you can recommend good tutorials on that, maybe relating to my case. I found a lot but in different context and was not able to comprehend it entrily or transcribe it for my case.
My case: I have three columns for each of 20 events representing the kind and date of that event. For each row I need to compute (and save to that data.frame) the difference in time between one special event (depending on whether a special kind happened before or after another) and a date fixed for every entry in rows. Furthermore i need to save the date of that event.
This is how i did (it works, but it is running only through the first cases):
#event.2 (1. event month), event.3 (1. event year), event.4 (1. event kind), event.5 (2. event month), event.6 (2. event year), ...
df$dit[(!is.na(df$event.2) & !is.na(df$event.3) & !is.na(df$event.4) & !is.na(df$event.5) & !is.na(df$event.6) & !is.na(df$event.7))
& (
(df$event.4 == 3 & ((1/12*df$event.2)+df$event.3) > df$fixdate) & (df$event.7 == 1 | df$event.7 == 2)
)] = ((1/12*df$event.2)+df$event.3) - df$fixdate
df$date[(!is.na(df$event.2) & !is.na(df$event.3) & !is.na(df$event.4) & !is.na(df$event.5) & !is.na(df$event.6) & !is.na(df$event.7))
& (
(df$event.4 == 3 & ((1/12*df$event.2)+df$event.3) > df$fixdate) & (df$event.7 == 1 | df$event.7 == 2)
)] = ((1/12*df$event.2)+df$event.3)
df$dit[(!is.na(df$event.2) & !is.na(df$event.3) & !is.na(df$event.4) & !is.na(df$event.5) & !is.na(df$event.6) & !is.na(df$event.7))
& (
(df$event.4 == 1 & ((1/12*df$event.2)+df$event.3) > df$fixdate)
| (df$event.4 == 2 & ((1/12*df$event.2)+df$event.3) > df$fixdate)
)] = 0
df$date[(!is.na(df$event.2) & !is.na(df$event.3) & !is.na(df$event.4) & !is.na(df$event.5) & !is.na(df$event.6) & !is.na(df$event.7))
& (
(df$event.4 == 1 & ((1/12*df$event.2)+df$event.3) > df$fixdate)
| (df$event.4 == 2 & ((1/12*df$event.2)+df$event.3) > df$fixdate)
)] = df$fixdate
df$dit[(!is.na(df$event.2) & !is.na(df$event.3) & !is.na(df$event.4) & !is.na(df$event.5) & !is.na(df$event.6) & !is.na(df$event.7))
& (
(
(df$event.4 == 1 & ((1/12*df$event.2)+df$event.3) < df$fixdate)
& (
(df$event.7 == 1 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
| (df$event.7 == 2 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
)
)
|
(
(df$event.4 == 2 & ((1/12*df$event.2)+df$event.3) < df$fixdate)
& (
(df$event.7 == 1 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
| (df$event.7 == 2 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
)
)
)] = ((1/12*df$event.5)+df$event.6) - df$fixdate
df$date[(!is.na(df$event.2) & !is.na(df$event.3) & !is.na(df$event.4) & !is.na(df$event.5) & !is.na(df$event.6) & !is.na(df$event.7))
& (
(
(df$event.4 == 1 & ((1/12*df$event.2)+df$event.3) < df$fixdate)
& (
(df$event.7 == 1 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
| (df$event.7 == 2 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
)
)
|
(
(df$event.4 == 2 & ((1/12*df$event.2)+df$event.3) < df$fixdate)
& (
(df$event.7 == 1 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
| (df$event.7 == 2 & ((1/12*df$event.5)+df$event.6) > df$fixdate)
)
)
)] = ((1/12*df$event.5)+df$event.6)
You can define your conditions as expressions and use them within transform. The idea is to factorize your conditions at most as possible .
COND1 <- expression(!is.na(event.2) & !is.na(event.3) &
!is.na(event.4) & !is.na(event.5) &
!is.na(event.6) & !is.na(event.7))
COND2 <- expression(event.4 == 3 & ((1/12*event.2)+event.3) > fixdate) &
(event.7 == 1 | event.7 == 2))
COND3 <- expression(event.4 == 1 & ((1/12*event.2)+event.3) > fixdate)
COND4 <- expression(event.4 == 2 & ((1/12*event.2)+event.3) > fixdate)
### you continue here with the rest of conditions....
Then using them within transform you can do something like:
transform(df, date = ifelse(eval(COND1) & eval(COND2),((1/12*event.2)+event.3),NA),
transform(df, date = ifelse(eval(COND1) & (eval(COND3)|eval(COND4)),fixdate,NA))
## Note also that the seond "dit" variable is deduced from "date"
transform(df,dit=date-fixdate)

For/While Loop on variables that satisfy a certain condition [R]

urI'm trying to write a if else statement (ultimately) in R, but only for variables that satisfy a certain criteria. I'm sure there is an easy way to do this - but can't seem to find anything specific when searching...
Below is an example of a while loop (not sure whether I can use this for this purpose):
while(gene[c(36)] >=30 & gene[c(37)] >=30 & gene[c(38)] >=30)
{
gene$Category <- ifelse((gene[c(49)] == './.' & gene[c(48)] == './.'), 'N/A', ifelse(((gene[c(50)] == './.') & (gene[c(36)] >=30 & gene[c(37)] >=30)),'denovo deletion',''))
}
I technically want to run the if else statement on a variable(s) only if certain other conditions are met. Am I overly complicating this?
Assuming that your ifelse construct is OK, you can "subset" the frame based on the condition that is now expressed in your while loop:
condition = (gene[36] >=30 & gene[37] >=30 & gene[38] >=30)
gene$Category[condition] <- ifelse((gene[49] == './.' & gene[48] == './.'), 'N/A', ifelse(((gene[50] == './.') & (gene[36] >=30 & gene[37] >=30)),'denovo deletion',''))

Create new data set that meets all of 4 conditions

I would like to create a new dataset where the following four conditions are all met.
rowSums(is.na(UNCA[,11:23]))<12
rowSums(is.na(UNCA[,27:39]))<12
rowSums(is.na(UNCA[,40:52]))<12
rowSums(is.na(UNCA[,53:65]))<12
Thanks!
Then use the & operator:
UNCA.new <- UNCA[rowSums(is.na(UNCA[,11:23])) < 12 &
rowSums(is.na(UNCA[,27:39])) < 12 &
rowSums(is.na(UNCA[,40:52])) < 12 &
rowSums(is.na(UNCA[,53:65])) < 12, ]
A single & is a vectorized function, while a double && is unary (typically used in an if statement, for instance).

Resources