Error in If else loop in R - r

I am writing one code in R. First I am creating one blank column in the data set and I want to assign 0 and 1 value in that column according to some conditions. Here is my code
#Creating a empty column in the data file
Mydata$final <- "";
#To assign 0,1 value in final variable
if(Mydata$Default_Config == "No" & is.na(Mydata$Best_Config)=="TRUE" & (Mydata$AlmostDefaultConfig!=1 | Mydata$AlmostDefaultConfig!=3)){
Mydata$final <- 1
}else{
Mydata$final <- 0
}
And I am getting this error
Warning message:
In if (Mydata$Default_Config == "No" & is.na(Mydata$Best_Config) == :
the condition has length > 1 and only the first element will be used
How Can I fix this error? Please help me out. Thanks in advance

Your problem is one of vectorisation. if is not vectorised. You are testing multiple values in each comparison in your if statement and R is telling you it will only use the first because if is not vectorised. You need ifelse which is vectorised:
ifelse( Mydata$Default_Config == "No" & is.na(Mydata$Best_Config)=="TRUE" & (Mydata$AlmostDefaultConfig!=1 | Mydata$AlmostDefaultConfig!=3) , 1 , 0 )
A reproducible example is below. If x is > 5 and y is even then return 1 otherwise return 0:
x <- 1:10
# [1] 1 2 3 4 5 6 7 8 9 10
y <- seq(1,30,3)
# [1] 1 4 7 10 13 16 19 22 25 28
x > 5
# [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
y %% 2 == 0
# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
ifelse( x > 5 & y %% 2 == 0 , 1 , 0 )
# [1] 0 0 0 0 0 1 0 1 0 1

An alternative approach is to take advantage of R's coercion. You have a set of conditionals which are vectorizable, and R is happy to convert TRUE/FALSE to 1 / 0, so you can write it like:
Mydata$final <- ( (Mydata$Default_Config == "No") *( is.na(Mydata$Best_Config)=="TRUE") * (Mydata$AlmostDefaultConfig!=1 + Mydata$AlmostDefaultConfig!=3)) )
(extra parentheses added for clarity) .
Apologies if I fouled up the logic there.
Edit: My code for the OR won't quite work, since if both sides are TRUE you'd get a big number ("2" :-) ). Change it to as.logical((Mydata$AlmostDefaultConfig!=1 + Mydata$AlmostDefaultConfig!=3))

Related

Using | and & simultaneously in R (either values can be true, but neither can be false)

I have two variables in a dataset: x and y
I want to subset the dataframe in a way where both can be true, but neither can be false
I've tried:
x == 1 | y == 1 & x != 0 & y !=0
I can grasp there is a problem when phrasing the logic like that, but I can't figure why, exactly
How should I be doing it?
x & y - both must be true;
x | y - at least 1 is true (or both);
xor(x, y) - exactly one is true (not both);
!x & !y - neither is true. (Same as !(x | y).)
That should cover it pretty well.
If you data is binary (1s and 0s), 1 will be treated as TRUE, and 0 will be treated as FALSE, so you don't need to bother with a bunch of ==.
If you combine multiple logical operators, I'd strongly suggest using parentheses to make sure the order of operations/grouping is what you think it is.
m = expand.grid(x = c(0:1,NA), y = c(0:1,NA) )
## Truth Table
# x y x & y x | y xor(x, y) !x & !y
# 1 0 0 FALSE FALSE FALSE TRUE
# 2 1 0 FALSE TRUE TRUE FALSE
# 3 NA 0 FALSE NA NA NA
# 4 0 1 FALSE TRUE TRUE FALSE
# 5 1 1 TRUE TRUE FALSE FALSE
# 6 NA 1 NA TRUE NA FALSE
# 7 0 NA FALSE NA NA NA
# 8 1 NA NA TRUE NA FALSE
# 9 NA NA NA NA NA NA
Because R interprets infix expressions of equal precedence from left to right, I think that once the first two have been evaluated you will find that the rest are entirely superfluous. The logical OR (|) will allow both to be true. It is not an XOR (so-called “exclusive OR). You should realize that if both x and y are NA that you will get an NA. So you might want to test for that first and decide what you want to occur when that is the case.

R - use if statement to regroup variable

I want to regroup a variable into a new one.
If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1
This is my try:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}
And this the error message:
In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used
A pretty basic question but I'm a basic user. Thanks in advance!
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0 the output / condition is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition), see ?"if".
You can avoid ifelse, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than #markus's second or #SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As #Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you #Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!

Conditional statement in sum() function of R

I've started learning R and got a piece of code in which a statement is:
if(sum(C == C[i]) == 1)# C is simply a vector and i is index of a value in this vector which the user specifies in an argument.
How can you pass a conditional statement as an argument of a function? Also explain the meaning of this statement.
Thank you.
Let's take an example to understand
Consider C as a numeric vector from 1 to 10 and let's take i as 3
C <- 1:10
i <- 3
So when we do
C == C[i]
#[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
it compares every element of C with C[i] which is 3 and returns a corresponding logical vector which is only TRUE at 3rd index.
When we sum this logical vector it returns count of all TRUE (as it considers FALSE as 0 and TRUE as 1) values which in this case is 1
sum(C == C[i])
#[1] 1
which is then compared to 1 again to make sure that there is only one C[i] in C
sum(C == C[i]) == 1
#[1] TRUE
This will fail in case if we have repeated numbers in C. For example,
C <- c(1:10, 3) #Adding an extra 3 in the end
C
#[1] 1 2 3 4 5 6 7 8 9 10 3
i <- 3
sum(C == C[i]) == 1
#[1] FALSE
The bottom line is the condition is TRUE if C[i] occurs only once in C.

conditional assignment of values

I have a dataset with you variables:
ACCURACY Feedback
141 0 3
156 0 1
167 1 2
185 1 1
191 1 NA
193 1 1
I have created a new column called X, where I would like to assign 3 potential values (correct, incorrect, unknown) based on combinations between the previous two values (i.e. accuracy ~ Feedback).
I have tried the next:
df$X=NA
df[!is.na((df$ACC==1)&(df$Feedback==1)),]$X <- "correct"
df[!is.na((df$ACC==1)&(df$Feedback==2)),]$X <- "unknown"
df[!is.na((df$ACC==1)&(df$Feedback==3)),]$X <- "incorrect"
df[!is.na((df$ACC==0)&(df$Feedback==1)),]$X <- "correct"
df[!is.na((df$ACC==0)&(df$Feedback==2)),]$X <- "unknown"
df[!is.na((df$ACC==0)&(df$Feedback==3)),]$X <- "incorrect"
But it doesnt assign a value in X based on both ACC and Feedback, but each line of code overrides the values assigned by the previous one.
I would appreciate any guidance/suggestions.
This can be done with nested ifelse functions. Although, based on the example posted, it looks like X depends only on Feedback, never ACCURACY.
ACCURACY Feedback
1 0 3
2 0 1
3 1 2
4 1 1
5 1 NA
6 1 1
df$X <- ifelse(df$ACCURACY == 1, ifelse(df$Feedback == 1, "correct", ifelse(df$Feedback == 2, "unknown", "incorrect")), ifelse(df$Feedback == 1, "correct", ifelse(df$Feedback == 2, "unknown", "incorrect")))
ACCURACY Feedback X
1 0 3 incorrect
2 0 1 correct
3 1 2 unknown
4 1 1 correct
5 1 NA <NA>
6 1 1 correct
If the values of X indeed do not depend on ACCURACY, you could just recode Feedback as a factor
df$X <- factor(df$Feedback,
levels = c(1, 2, 3),
labels = c("correct", "unkown", "incorrect"))
The issue is that you've wrapped all the assignment conditions with !is.na. These vectors all evaluate to the same thing. For example:
> !is.na((df$ACC==1)&(df$Feedback==2))
[1] TRUE TRUE TRUE TRUE FALSE TRUE
> !is.na((df$ACC==1)&(df$Feedback==3))
[1] TRUE TRUE TRUE TRUE FALSE TRUE
A possible solution would be to write a little function to do the assignments you want, and then use apply.
recoder <- function(row) {
accuracy <- row[['ACCURACY']]
feedback <- row[['Feedback']]
if(is.na(accuracy) || is.na(feedback)) {
ret_val <- NA
}
else if((accuracy==1 && feedback==1) || (accuracy==0 && feedback==1)) {
ret_val <- "correct"
}
else if((accuracy==1 & feedback==2) || (accuracy==0 & feedback==2)) {
ret_val <- "unknown"
}
else {
ret_val <- "incorrect"
}
return(ret_val)
}
df$X <- apply(df, 1, recoder)
df
> df
ACCURACY Feedback X
141 0 3 incorrect
156 0 1 correct
167 1 2 unknown
185 1 1 correct
191 1 NA <NA>
193 1 1 correct

Setting parameters for a factor

I am having trouble setting the parameters for a factor and would appreciate some help. I want to make a dummy variable where it is equal to 0 when a variable equals five different points and 1 for all others. So far I have tried the following:
htd$CBSA = factor(with(data = htd, ifelse(( cbsa==41460|16980|35620|37980|14460),0,1)))
and
htd$CBSA = as.numeric(htd$cbsa == 41460|16980|35620|37980|14460)
and tried any combination of , and "" in place of the | and don't know where to go.
Thanks for any help
Notice:
> 1 == 3
[1] FALSE
> 1 == 3 | 1 == 2
[1] FALSE
> 1 == 3 | 2
[1] TRUE
You want %in%:
> 1 %in% c(3, 2)
[1] FALSE
> 1 %in% c(3, 1)
[1] TRUE

Resources