recode variables with NAs with ifelse() in R - r

I am recoding two binary variables into new variable such that any 1s in the first variable take a 0 in the new one and all digits in the second variable are preserved. The code below shows the logic that I would like to produce. However, when I run this code, the recoding using the ifelse() just recreates x2 without incorporating the first ifelse() line that uses x1's 1s as 0. Thoughts?
set.seed(123)
x1 <- sample(c(0,1,NA), 20, replace = TRUE)
x2 <- sample(c(0,1,NA), 20, replace = TRUE)
recode <- ifelse(x1 == 1, 0, NA)
recode <- ifelse(x2 == 1, 1, recode)
recode <- ifelse(x2 == 0, 0, recode)
table(recode); table(x2)
Thanks

Sorry, but it does what you wanted to do. The problem that you might have forgotten is that the result of comparison of NA with anything is also NA, so ifelse( x2 == 0, yes, no ) returns NA (instead of no) if x2 == NA.
Better try
recode <- rep( NA, length( x1 ) )
recode[ x1 == 1 ] <- 0
recode[ ! is.na( x2 ) ] <- x2[ ! is.na( x2 ) ]

Maybe you want this?
ifelse(is.na(x2), ifelse(x1 == 1, 0, NA), x2)

You overwrote those results. The relevant line from the Details section of the help('ifelse') page is:
Missing values in test give missing values in the result.
recode <- ifelse(x1 == 1, 0, NA)
recode[ !is.na(x2)] <- x2[!is.na(x2)]

I am posting this just to figure out if there is some reason this one liner was not suggested:
recode <- ifelse(x1 %in% 1 & is.na(x2), 0, x2)

Related

Using ifelse for changing columns in R

I would like to apply the ifelse condition to the following data frame according to the schema. I can do it repeatedly, but I have a lot of data.
My code:
d <- data.frame(x_1 = sample(1:100,10),x_2 = sample(1:100,10), y_1 =sample(1:100,10), y_2 =sample(1:100,10),
y_3 =sample(1:100,10), y_4 =sample(1:100,10))
ifelse(d$x_1>d$y_1, 0, d$x_1-d$y_1)
ifelse(d$x_2>d$y_2, 0, d$x_2-d$y_2)
ifelse(d$x_1>d$y_3, 0, d$x_1-d$y_3)
ifelse(d$x_2>d$y_4, 0, d$x_2-d$y_4) # x_1>y_5..., x_2>y_6,...
Edit:
My x_.* are days of the week so I have x_1...x_7. But my y_.* are many. Code should work as follows:
x_1-y_1
x_2-y_2
x_3-y_3
x_4-y_4
x_5-y_5
x_6-y_6
x_7-y_7
x_1-y_8
x_2-y_9
.
.
.
If you want to compare every x_.* column with every y_.* column you can use outer.
First find out "x" and "y" columns.
x_col <- grep('x', names(d), value = TRUE)
y_col <- grep('y', names(d), value = TRUE)
We can create an index to subset x_col. The ifelse logic can be simplified to pmin
inds <- seq_along(y_col) %% length(x_col)
inds[inds == 0] <- length(x_col)
We can use mapply to subtract columns.
mapply(function(x, y) pmin(0, x - y), d[inds], d[y_col])

Creating a variable to count number of zero values across variables occurring in each observation- R

I am trying to figure out a way to do this in R and for the life of me can't figure it out. Let's say I have a df consisting of the following.
v1<- c(0, 0, 2, 0 1 3)
v2<- c(1, 0, 8, 1 ,0)
v3<- c(0, 1, 3, 0, 0)
v4<- c(0, 0, 0, 0, 0)
df<- data.frame(v1, v2,v3, v4)
I want to create a new variable, say num_zeros, that counts the number of 0s for each observation in v1 to v3. Is there a quick way to do this? Any help would be greatly appreciated!
We can use rowSums on a logical matrix to get the count of 0 values and assign it to 'num_zeros' column
df$num_zeros <- rowSums(df[c('v1', 'v2', 'v3')] == 0)
Or another option is
df$num_zeros <- (df$v1 == 0) + (df$v2 == 0) + (df$v3 == 0)
NOTE: Both methods are efficient and are vectorized
We can use apply rowwise :
cols <- paste0('v', 1:3)
df$num_zeros <- apply(df[cols] == 0, 1, sum)
Or with lapply :
df$num_zeros <- Reduce(`+`, lapply(df[cols], `==`, 0))

Conditional if/else statement in R

I am learning to improve my coding in R. I have this code:
data$score[testA == 1] <- testA_score
data$score[testB==1] <- testB_score
So basically I have four columns that I want to combine into one: testA=1 indicates if the student took version A of the test and testA_score is their score; testB=1 indicates if the student took version B of the test and testB_score is their score. I want to combine this information into new column score.
As well Suppose I had testA, testB through testH. All values are 0 or 1. How can I make new column test_complete which is = 1 if any of the tests are = 1?
Basically as a former Stata user I am looking for the R equivalent commands to egen rowtotal and egenrowfirst. Thanks so much.
you can take max out of all test : since it 1 or 0 values only if at least one test is completed max will be equal to 1
testA <- c(1,0, 0, 1,0,0,0)
testB <- c(0, 1,0, 0, 1,0,1)
testC <- c(0, 0, 0,1, 0, 1, 0)
df <- as.data.frame(cbind(testA, testB, testC))
df$completed <- apply(df[, 1:3], 1, max)
So if I understand correctly, taking the maximum value by row should give what you need:
binary <- c(0,1)
df <- data.frame(
score1 = sample(binary, 20, replace = TRUE),
score2 = sample(binary, 20, replace = TRUE),
score3 = sample(binary, 20, replace = TRUE)
)
df$passed <- apply(df, 1, max)
head(df)

Writing if / ifelse function in R

I am attempting to write a function in order to create a variable (BBDR) based on the conditions of another variable (Site0) using the if function. I have the following code using the if function.
x1 <- (africanaDamRate$BB6-africanaDamRate$BB0)/29
x2 <- (africanaDamRate$BB6-africanaDamRate$BB0)/22
x3 <- (africanaDamRate$BB6-africanaDamRate$BB0)/34
x4 <- (africanaDamRate$BB6-africanaDamRate$BB0)/30
F1 <- function(y){
if(africanaDamRate$Site0==1){africanaDamRate$BBDR<-x1}
if(africanaDamRate$Site0==2){africanaDamRate$BBDR<-x2}
if(africanaDamRate$Site0==3){africanaDamRate$BBDR<-x3}
if(africanaDamRate$Site0==4){africanaDamRate$BBDR<-x4}
}
africanaDamRate$BBDR<-F1(y)
But when I attempt this code I receive "The condition has length greater than 1..."
I have also attempted using the ifelse function with the following code:
africanaDamRate$BBDR<-ifelse(c(africanaDamRate$Site0==1, x1, NA), c(africanaDamRate$Site0==2, x2, NA), c(africanaDamRate$Site0==3, x3, NA), c(africanaDamRate$Site0==4, x4, NA))
But get the "unused argument" error.
Does anyone have any ideas of how I can do this (without subsetting)? Thanks so much!
Ryan
Your ifelse statement is wrong. It could be written like this:
africanaDamRate$BBDR <- ifelse(africanaDamRate$Site0 == 1, x1,
ifelse(africanaDamRate$Site0 == 2, x2,
ifelse(africanaDamRate$Site0 == 3, x3,
ifelse(africanaDamRate$Site0 == 4, x4, NA))))

How to set a column value based on values in another column in R

I am trying to add a new column based on values in another column. (Basically if the other column is missing or 0, set the new value to 0 or to 1)
What's wrong with this code below?
times=nrow(eachfile)
for(i in 1:times)
{eachfile$SalesCycleN0[i] <- ifelse(eachfile$R[i]==NA | eachfile$R[i]==0,0,1 ) }
table(eachfile$SalesCycleN0)
As long as you have tested that the column only contains 0, 1 and NA I would do:
eachfile$SalesCycleN0 <- 1
eachfile$SalesCycleN0[is.na(eachfile$R) | eachfile$R==0] <- 0
Nothing is ever "==" to NA. Just do this (no loop):
eachfile$SalesCycleN0 <- ifelse( is.na(eachfile$R) | eachfile$R==0, 0,1 )
If you were looking for a little more economy in code this might also work:
eachfile$SalesCycleN0 <- as.numeric( !grepl("^0$", eachfile$R) )
grepl returns FALSE for NA's.
A more efficient way of doing this is using the sapply function, rather than using a for loop (handy in case of huge dataset). Here is an example:
df = data.frame(x = c(1,2,0,NA,5))
fun = function(i) {is.na(df$x[i]) || (df$x[i] == 0)}
bin <- (sapply(1:nrow(df), FUN = fun))*1 ## multiplying by 1 will convert the logical vector to a binary one.
df <- cbind(df, bin)
In your case:
fun = function(i) {is.na(eachfile$SalesCycleNO[i]) || (eachfile$SalesCycleNO[i] == 0)}
bin <- (sapply(1:times, FUN = fun))*1
eachfile <- cbind(eachfile, bin)

Resources