Using ifelse in R

Using ifelse in R - r

I am trying to code the following statement in R with if and ifelse.The sample data is trial and x,y,and z are columns of trial).
Statements to be coded
if (x>0) {
if (y>0) {
l=2
}else{
l=5
}
if (z>0) {
m=l+2
}else{
m=5
}
}
The R code using ifelse
trial$l<-with(trial, ifelse((x>0 &y>0),2,ifelse((x>0 &y<=0),5,???)))
trial$m<-with (trial,ifelse((x>0 &z>0),l+2,ifelse((x>0 &z<=0),5,???)))
where, ??? specifies that there are no values according to the above statement. In other words for x<0 and y there are no values.
Next, I use combination of if and ifelse to see that works:
if(trial$z>0){
trial$l<-with(trial, ifelse(y>0,2,5))
trial$m<-with(trial, ifelse(z>0,l+2,5))
}
This code is ok but there is a warning message (since z is a column vector)
In if (trial$z>0){
the condition has length>1 and only the first element will be used
I want to focus only on using ifelse since I am dealing with only vector. But, I have no luck in this regard. Any idea?

If you want to use ifelse and nest things you could do something like this
test <- data.frame(x = 2, y = 5, z = 3)
with(test, ifelse(z > 0 & x > 0 | y > 3, "yes", "no"))
In this case you're using logical operators to guard the output. You'll still get "no" if z <= 0, but you can deal with that pretty easily.
with(test, ifelse(z > 0, ifelse(x > 0 | y > 3, "yes", "no"), NA))
Nested ifelse statements can get hard to follow in any language, so consider matching or switch statements if you end up with more than 3 of them.

I would use transform twice for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
trial <- transform(trial,l = ifelse(x>0,ifelse(y > 0,2,5),NA))
transform(trial,m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z l m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
Note that I assign NA for case x < 0. You can use a one transform like this for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
transform(trial,l <- ifelse(x>0,ifelse(y > 0,2,5),NA),
m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z c.NA..5..2. m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
But personally I would prefer the first one for readability besides the fact you need maybe change column names.

Related

Multiple conditional statements within lapply function to avoid looping

I would like to write a function with multiple conditions within lapply. I know how to define multiple conditions using a for loop. However I would like to avoid looping this time.
For instance:
let's assume there is a vector (vctr) with numbers from -3 to 5:
set.seed(10)
vctr <- c(sample(-3:5), rep(0,3), sample(-3:5), 0)
and let's define two conditions:
condition_1 <- if the number is equal to 0 -> add 1 to the initial
value
condition_2 <- if the number is not equal to 0 -> leave it be
And this works perfectly fine:
test_list <- lapply(1:length(vctr), function(x) if(vctr[x]==0) vctr[x] +1 else vctr[x])
However, what about the situation in which there would be multiple conditions? For instance:
condition_1 <- if the number is equal to 0 -> add 1
condition_2 <- if the number is negative -> replace it with absolute value
condition_3 <- if the number is greater than 0 but lower than 3 ->
add 2 to the initial value
condition_4 <- if the number is equal or greater than 3 -> leave it be
I tried the following with conditions 1, 2 and "leave it be" however this syntax does not work.
test_list2 <- lapply(1:length(vctr), function(x) if(vctr[x]==0) vctr[x] +1 if else(vctr[x]<0) abs(vctr[x]) else vctr[x])
EDIT: I would like to ask for non-dplyr solutions.

you can replace sapply with lapply if you want a list output
sapply(vctr, function(x) {
if (x == 0) y <- x + 1
if (x < 0) y <- abs(x)
if (x > 0 & x < 3) y <- x + 2
if (x >= 3) y <- x
return(y)
})
[1] 5 2 1 4 1 3 3 3 4 1 1 1 3 2 4 3 1 5 1 3 4 1

R: How to loop over variable names using number indexes (like in Stata)

I've been trying unsuccessfully to replicate in R the following Stata loop:
forvalues i=1/10 {
replace var`i'= a if other_var`i'==b
}
So far I've got this as the closest attempt:
for(i in 1:10) {
df <- df %>%
mutate(get(paste("var",i,sep="")) =
ifelse(get(paste("other_var",i,sep=""))==b
,a
,get(paste("var",i,sep=""))))
}
But I get the following error:
Error: unexpected '=' in:
"survey_data <- survey_data %>%
mutate(paste("offer",i,"_accepted",sep="") ="
If I change the variable to be mutated to a simple variable name, it works, so I'm guessing my code is OK for the "right-hand side of the mutation", but for some reason it's not OK for the "left-hand side".

This solution is very inelegant, but I think does exactly what you want.
var1 <- "x"
var2 <- "y"
var3 <- "z"
other_var1 <- 1
other_var2 <- 0
other_var3 <- 1
df <- data.frame(var1, other_var1, var2, other_var2, var3, other_var3)
for(i in 1:3){
var_name <- paste("df$var", i, sep = "")
other_var_name <- paste("df$other_var", i, sep = "")
if (eval(parse(text = other_var_name)) == 1){
assign(var_name, "a")
}
}
There are three key ingredients here. First the paste() function to create the names of the variables in the current iteration of the loop. Second, the eval(parse(foo)) combo to reference the actual variable whose name is stored as string in foo. Third, using assign() to assign values to a variable (as opposed to using <-).

This looks like FAQ 7.21.
The most important part of that answer is at the end where is says to use a list instead.
Trying to work on a group of global variables in R leads to complicated code that is hard to read and even harder to debug.
If you instead put those variables into a single list, then you can access them by name or position and use tools like lapply or the purrr package (part of tidyverse) to process everything in the list (or some of the things in the list using map_at or map_if from purrr).
If tell us more about what you are trying to accomplish, we may be able to give a much simpler example of how to do it.

You can do something like the following:
df <- structure(list(var1 = c(1, 2, 3, 4),
var2 = c(1, 2, 3, 4),
var3 = c(1,2, 3, 4),
var4 = c(1, 2, 3, 4),
other_var1 = c(1, 0, 1, 0),
other_var2 = c(0,1, 0, 1),
other_var3 = c(1, 1, 0, 0),
other_var4 = c(0, 0, 1,1)),
class = "data.frame",
row.names = c(NA, -4L))
# var1 var2 var3 var4 other_var1 other_var2 other_var3 other_var4
# 1 1 1 1 1 1 0 1 0
# 2 2 2 2 2 0 1 1 0
# 3 3 3 3 3 1 0 0 1
# 4 4 4 4 4 0 1 0 1
## Values to replace based on OP original question
a <- 777
b <- 1
## Iter along all four variables avaible in df
for (i in 1:4) {
df <- within(df, {
assign(paste0("var",i), ifelse(get(paste0("other_var",i)) %in% c(b), ## Condition
a, ## Value if Positive
get(paste0("var",i)) )) ## Value if Negative
})
}
which results in the following output:
# var1 var2 var3 var4 other_var1 other_var2 other_var3 other_var4
# 1 777 1 777 1 1 0 1 0
# 2 2 777 777 2 0 1 1 0
# 3 777 3 3 777 1 0 0 1
# 4 4 777 4 777 0 1 0 1
The solution doesn't look like a one-line-solution, but it actually is one, a quite dense one tho; hence let's see how it works by its foundation components.
within(): I don't want to repeat what other people have excellently explained, so for the within() usage, I gently refer you here.
The: assign(paste0("var",i), X) part.
Here I am following that #han-tyumi did in his answer, meaning recover the name of the variables using paste0() and assign them the value of X(to be explained) using the assign() function.
Let's talk about X.
Before I referenced assign(paste0("var",i), X). Where, indeed, X is equal to ifelse(get(paste0("other_var",i)) %in% c(b), a, get(paste0("var",i)) ).
Inside the ifelse():
The condition:
First, I recover the values of variable other_var(i) (with i = 1,2,3,4) combining the function get() with paste0() while looping. Then, I use the %in% operator to check whether the value assigned to variable b(on my example, the number 1) was contained on variable other_var(i) or not; this generates a TRUE or FALSE depending if the condition is met.
The TRUE part of the ifelse() function.
This is the simplest part if the condition is met then assign, a (which in my example is equal to 777).
The FALSE part of the ifelse() function.
get(paste0("var",i)): which is the value of the variable itself (meaning, if the condition is not meet, then keep the variable unaltered).

Inserting value from another column based on a condition [duplicate]

I am trying to code the following statement in R with if and ifelse.The sample data is trial and x,y,and z are columns of trial).
Statements to be coded
if (x>0) {
if (y>0) {
l=2
}else{
l=5
}
if (z>0) {
m=l+2
}else{
m=5
}
}
The R code using ifelse
trial$l<-with(trial, ifelse((x>0 &y>0),2,ifelse((x>0 &y<=0),5,???)))
trial$m<-with (trial,ifelse((x>0 &z>0),l+2,ifelse((x>0 &z<=0),5,???)))
where, ??? specifies that there are no values according to the above statement. In other words for x<0 and y there are no values.
Next, I use combination of if and ifelse to see that works:
if(trial$z>0){
trial$l<-with(trial, ifelse(y>0,2,5))
trial$m<-with(trial, ifelse(z>0,l+2,5))
}
This code is ok but there is a warning message (since z is a column vector)
In if (trial$z>0){
the condition has length>1 and only the first element will be used
I want to focus only on using ifelse since I am dealing with only vector. But, I have no luck in this regard. Any idea?

If you want to use ifelse and nest things you could do something like this
test <- data.frame(x = 2, y = 5, z = 3)
with(test, ifelse(z > 0 & x > 0 | y > 3, "yes", "no"))
In this case you're using logical operators to guard the output. You'll still get "no" if z <= 0, but you can deal with that pretty easily.
with(test, ifelse(z > 0, ifelse(x > 0 | y > 3, "yes", "no"), NA))
Nested ifelse statements can get hard to follow in any language, so consider matching or switch statements if you end up with more than 3 of them.

I would use transform twice for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
trial <- transform(trial,l = ifelse(x>0,ifelse(y > 0,2,5),NA))
transform(trial,m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z l m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
Note that I assign NA for case x < 0. You can use a one transform like this for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
transform(trial,l <- ifelse(x>0,ifelse(y > 0,2,5),NA),
m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z c.NA..5..2. m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
But personally I would prefer the first one for readability besides the fact you need maybe change column names.

Subsetting data conditional on first instance in R

data:
row A B
1 1 1
2 1 1
3 1 2
4 1 3
5 1 1
6 1 2
7 1 3
Hi all! What I'm trying to do (example above) is to sum those values in column A, but only when column B = 1 (so starting with a simple subset line - below).
sum(data$A[data$B==1])
However, I only want to do this the first time that condition occurs until the values switch. If that condition re-occurs later in the column (row 5 in the example), I'm not interested in it!
I'd really appreciate your help in this (I suspect simple) problem!

Using data.table for syntax elegance, you can use rle to get this done
library(data.table)
DT <- data.table(data)
DT[ ,B1 := {
bb <- rle(B==1)
r <- bb$values
r[r] <- seq_len(sum(r))
bb$values <- r
inverse.rle(bb)
} ]
DT[B1 == 1, sum(a)]
# [1] 2

Here's a rather elaborate way of doing that:
data$counter = cumsum(data$B == 1)
sum(data$A[(data$counter >= 1:nrow(data) - sum(data$counter == 0)) &
(data$counter != 0)])

Another way:
idx <- which(data$B == 1)
sum(data$A[idx[idx == (seq_along(idx) + idx[1] - 1)]])
# [1] 2
# or alternatively
sum(data$A[idx[idx == seq(idx[1], length.out = length(idx))]])
# [1] 2
The idea: First get all indices of 1. Here it's c(2,3,5). From the start index = "2", you want to get all the indices that are continuous (or consecutive, that is, c(2,3,4,5...)). So, from 2 take that many consecutive numbers and equate them. They'll not be equal the moment they are not continuous. That is, once there's a mismatch, all the other following numbers will also have a mismatch. So, the first few numbers for which the match is equal will only be the ones that are "consecutive" (which is what you desire).

R set value in dependency of another value

For each row of my dataframe, I want to calculate a value from numbers taken from columns of this dataframe. If the calculated value is above 2, I want to set another columns value to 0, else to 1.
x=(df$firstnumber+df$secondnumer)/2
if(x>2){
df$binaryValue=0}
else{ df$binaryValue=1}
this throws the error
the condition has length > 1 and only the first element will be used
because x is a vector
How can I solve this? One way would be to write this as a function and to apply it to the dataframe - are there any other options?
Also, how could I write this to work with appl() ?
Thanks in advance

You could simply do...
df$BinaryValue <- ifelse( x > 2 , 0 , 1 )
So you get...
df <- data.frame( x = 1:5 , y = -2:2 )
x <- df$x + df$y
df$BinaryValue <- ifelse( x > 2 , 0 , 1 )
df
# x y BinaryValue
# 1 1 -2 1
# 2 2 -1 1
# 3 3 0 0
# 4 4 1 0
# 5 5 2 0

transform(df, BinaryValue = as.numeric(firstnumber + secondnumber > 4))
There's no need to divide by two in the first place. You could check whether the sum is greater than four. The function as.numeric is employed to transform boolean to numeric (0 and 1) values.