everyone.
Hopefully an easy syntax question. I'm trying to create a new variable in a table in R which would say "1" if my patient was in the age range I was looking at, or "0" for no. The age range I'm interested is between 2-155. The code is running without any errors, but it is not working. When I look in my table, the new variable will say 1 even though the age4 is 158 Here is what I have:
table$newvar <- if (table$age4>=2 && table$age4 <=155) {table$newvar=1} else {table$newvar=0}
Any help is appreciated! Thanks in advance!
Two changes should be made:
Use the vectorized ifelse() function to generate the new column data.
Use the vectorized & logical-AND operator when combining the results of the comparisons.
table <- data.frame(age4=seq(1,200,10));
table$newvar <- ifelse(table$age4>=2 & table$age4<=155,1,0);
table;
## age4 newvar
## 1 1 0
## 2 11 1
## 3 21 1
## 4 31 1
## 5 41 1
## 6 51 1
## 7 61 1
## 8 71 1
## 9 81 1
## 10 91 1
## 11 101 1
## 12 111 1
## 13 121 1
## 14 131 1
## 15 141 1
## 16 151 1
## 17 161 0
## 18 171 0
## 19 181 0
## 20 191 0
The reason your code is not working is because the if statement and the && operator are not vectorized. The && operator only examines the first element of each operand vector, and only returns a one-element vector representing the result of the logical-AND on those two input values. The if statement always expects a one-element vector for its conditional, and executes the if-branch if that element is true, or the else-branch if false.
If you use a multiple-element vector as the conditional in the if statement, you get a warning:
if (c(T,F)) 1 else 0;
## [1] 1
## Warning message:
## In if (c(T, F)) 1 else 0 :
## the condition has length > 1 and only the first element will be used
But for some odd reason, you don't get a warning if you use a multiple-element vector as an operand to && (or ||):
c(T,F) && c(T,F);
## [1] TRUE
That's why your code appeared to succeed (by which I mean it didn't print any warning message), but it didn't actually do what was intended.
When used in arithmetic TRUE and FALSE become 1 and 0 so:
transform(table, newvar = (age4 >= 2) * (age4 <= 155) )
These also work:
transform(table, newvar = as.numeric( (age4 >= 2) & (age4 <= 155) ) )
transform(table, newvar = ( (age4 >= 2) & (age4 <= 155) ) + 0 )
transform(table, newvar = ifelse( (age4 >= 2) & (age4 <= 155), 1, 0) )
transform(table, newvar = (age4 %in% 2:155) + 0) # assuming no fractional ages
Related
to fill an empty column of a dataframe based on a condition taking another column into account, i have found following solution, which works fine, but is somehow a little bit ugly. does anybody know a more elegant way to solve this?
base::set.seed(123)
test_df <- base::data.frame(vec1 = base::sample(base::seq(1, 100, 1), 50), vec2 = base::seq(1, 50, 1), vec3 = NA)
for (a in 1:base::nrow(test_df)){
spc_test_df <- test_df[a, ]
# select the specific row of the dataframe
if(spc_test_df$vec1 <= 25 | spc_test_df$vec1 >= 75){
# evaluate whether the deviation is below/above the threshold
spc_test_df$vec3 <- 1
# if so, write TRUE
} else {
spc_test_df$vec3 <- 0
# if not so, write FALSE
}
test_df[a, ] <- spc_test_df
# write the specific row back to the dataframe
}
There is no need for a for-loop as you can use vectorized solutions in this case. Three options on how to solve this problem:
# option 1
test_df$vec3 <- +(test_df$vec1 <= 25 | test_df$vec1 >= 75)
# option 2
test_df$vec3 <- as.integer(test_df$vec1 <= 25 | test_df$vec1 >= 75)
# option 3
test_df$vec3 <- ifelse(test_df$vec1 <= 25 | test_df$vec1 >= 75, 1, 0)
which in all cases gives:
vec1 vec2 vec3
1 5 1 1
2 6 2 1
3 61 3 0
4 20 4 1
....
47 3 47 1
48 55 48 0
49 44 49 0
50 97 50 1
(only first and last four rows presentend)
I want to regroup a variable into a new one.
If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1
This is my try:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}
And this the error message:
In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used
A pretty basic question but I'm a basic user. Thanks in advance!
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0 the output / condition is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition), see ?"if".
You can avoid ifelse, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than #markus's second or #SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As #Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you #Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!
I have a dataframe called barometre2013 with a column called q0qc that contain this numbers:
[1] 15 1 9 15 9 3 6 3 3 6 6 10 15 6 15 10
I want to add +1 to the numbers that are >= 10, so the result should be this:
[1] 16 1 9 16 9 3 6 3 3 6 6 11 16 6 16 11
I have tried this code:
if (barometre2013$q0qc > 9) {
barometre2013$q0qc <- barometre2013$q0qc + 1
}
But this add +1 to all the numbers without respecting the condition:
[1] 16 2 10 16 10 4 7 4 4 7 7 11 16 7 16 11
How can I do what I want ?
Thank a lot.
When you executed:
if (barometre2013$q0qc > 9) {
barometre2013$q0qc <- barometre2013$q0qc + 1
}
... you should have seen a warning about "only the first value being evaluated". That first value in barometre2013$q0qc was 15 and since it was TRUE, then that assignment was done on the entire vector. ifelse or Boolean logic are approaches suggested in the comments for conditional evaluation and/or assignment. The first:
barometre2013$q0qc <- barometre2013$q0qc + (barometre2013$q0qc >= 10)
... added a vector of 1 and 0's to the starting vector; 1 if the logical expression is satisfied and 0 if not. If you wanted to add something other than one (which is the numeric value of TRUE) you could have multiplied that second term by the desired increment or decrement.
Another approach was to use ifelse which does do a conditional test of its first argument on returns either the second or third argument on an item-by-item basis:
barometre2013$q0qc <- barometre2013$q0qc + ifelse(barometre2013$q0qc >= 10, 1, 0)
The third approach suggested by dash2 would be to only modify those values that meet the condition. Note that this method requires having the "test vector on both sides of the assignment (which is why dash2 was correcting the earlier comment:
barometre2013$q0qc[barometre2013$q0qc>=10] <-
barometre2013$q0qc[barometre2013$q0qc>=10]+ 1
data <- c(15,1,9,15,9,3,6,3,3,6,6,10,15,6,15,10)
data2 <-
as.numeric(
for(i in data){
if(i >= 10){ i = i +1 }
print(i)
}
)
class(data)
class(data2)
I have a dataset with you variables:
ACCURACY Feedback
141 0 3
156 0 1
167 1 2
185 1 1
191 1 NA
193 1 1
I have created a new column called X, where I would like to assign 3 potential values (correct, incorrect, unknown) based on combinations between the previous two values (i.e. accuracy ~ Feedback).
I have tried the next:
df$X=NA
df[!is.na((df$ACC==1)&(df$Feedback==1)),]$X <- "correct"
df[!is.na((df$ACC==1)&(df$Feedback==2)),]$X <- "unknown"
df[!is.na((df$ACC==1)&(df$Feedback==3)),]$X <- "incorrect"
df[!is.na((df$ACC==0)&(df$Feedback==1)),]$X <- "correct"
df[!is.na((df$ACC==0)&(df$Feedback==2)),]$X <- "unknown"
df[!is.na((df$ACC==0)&(df$Feedback==3)),]$X <- "incorrect"
But it doesnt assign a value in X based on both ACC and Feedback, but each line of code overrides the values assigned by the previous one.
I would appreciate any guidance/suggestions.
This can be done with nested ifelse functions. Although, based on the example posted, it looks like X depends only on Feedback, never ACCURACY.
ACCURACY Feedback
1 0 3
2 0 1
3 1 2
4 1 1
5 1 NA
6 1 1
df$X <- ifelse(df$ACCURACY == 1, ifelse(df$Feedback == 1, "correct", ifelse(df$Feedback == 2, "unknown", "incorrect")), ifelse(df$Feedback == 1, "correct", ifelse(df$Feedback == 2, "unknown", "incorrect")))
ACCURACY Feedback X
1 0 3 incorrect
2 0 1 correct
3 1 2 unknown
4 1 1 correct
5 1 NA <NA>
6 1 1 correct
If the values of X indeed do not depend on ACCURACY, you could just recode Feedback as a factor
df$X <- factor(df$Feedback,
levels = c(1, 2, 3),
labels = c("correct", "unkown", "incorrect"))
The issue is that you've wrapped all the assignment conditions with !is.na. These vectors all evaluate to the same thing. For example:
> !is.na((df$ACC==1)&(df$Feedback==2))
[1] TRUE TRUE TRUE TRUE FALSE TRUE
> !is.na((df$ACC==1)&(df$Feedback==3))
[1] TRUE TRUE TRUE TRUE FALSE TRUE
A possible solution would be to write a little function to do the assignments you want, and then use apply.
recoder <- function(row) {
accuracy <- row[['ACCURACY']]
feedback <- row[['Feedback']]
if(is.na(accuracy) || is.na(feedback)) {
ret_val <- NA
}
else if((accuracy==1 && feedback==1) || (accuracy==0 && feedback==1)) {
ret_val <- "correct"
}
else if((accuracy==1 & feedback==2) || (accuracy==0 & feedback==2)) {
ret_val <- "unknown"
}
else {
ret_val <- "incorrect"
}
return(ret_val)
}
df$X <- apply(df, 1, recoder)
df
> df
ACCURACY Feedback X
141 0 3 incorrect
156 0 1 correct
167 1 2 unknown
185 1 1 correct
191 1 NA <NA>
193 1 1 correct
I am writing one code in R. First I am creating one blank column in the data set and I want to assign 0 and 1 value in that column according to some conditions. Here is my code
#Creating a empty column in the data file
Mydata$final <- "";
#To assign 0,1 value in final variable
if(Mydata$Default_Config == "No" & is.na(Mydata$Best_Config)=="TRUE" & (Mydata$AlmostDefaultConfig!=1 | Mydata$AlmostDefaultConfig!=3)){
Mydata$final <- 1
}else{
Mydata$final <- 0
}
And I am getting this error
Warning message:
In if (Mydata$Default_Config == "No" & is.na(Mydata$Best_Config) == :
the condition has length > 1 and only the first element will be used
How Can I fix this error? Please help me out. Thanks in advance
Your problem is one of vectorisation. if is not vectorised. You are testing multiple values in each comparison in your if statement and R is telling you it will only use the first because if is not vectorised. You need ifelse which is vectorised:
ifelse( Mydata$Default_Config == "No" & is.na(Mydata$Best_Config)=="TRUE" & (Mydata$AlmostDefaultConfig!=1 | Mydata$AlmostDefaultConfig!=3) , 1 , 0 )
A reproducible example is below. If x is > 5 and y is even then return 1 otherwise return 0:
x <- 1:10
# [1] 1 2 3 4 5 6 7 8 9 10
y <- seq(1,30,3)
# [1] 1 4 7 10 13 16 19 22 25 28
x > 5
# [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
y %% 2 == 0
# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
ifelse( x > 5 & y %% 2 == 0 , 1 , 0 )
# [1] 0 0 0 0 0 1 0 1 0 1
An alternative approach is to take advantage of R's coercion. You have a set of conditionals which are vectorizable, and R is happy to convert TRUE/FALSE to 1 / 0, so you can write it like:
Mydata$final <- ( (Mydata$Default_Config == "No") *( is.na(Mydata$Best_Config)=="TRUE") * (Mydata$AlmostDefaultConfig!=1 + Mydata$AlmostDefaultConfig!=3)) )
(extra parentheses added for clarity) .
Apologies if I fouled up the logic there.
Edit: My code for the OR won't quite work, since if both sides are TRUE you'd get a big number ("2" :-) ). Change it to as.logical((Mydata$AlmostDefaultConfig!=1 + Mydata$AlmostDefaultConfig!=3))