Creating vectors with ifelse or if else - r
I still get tripped up using ifelse and if...else when I want to create a vector or new data.frame variable. The title of this question seems closely related, but does not address my issue: Why can't R's ifelse statements return vectors?
The code below shows my attempts to create the variables my.data2$v1b and my.data2$v2b. I failed with ifelse and if...else then succeeded with a for-loop and with apply.
Is there a way to create my.data2$v1b and my.data2$v2b with ifelse or if...else? I assume not based on my attempts and other Stack Overflow questions. So, what is the canonical way of creating these variables in R? Using apply works, but seems rather complex. Using a for-loop works but I get the impression for-loops are to be avoided.
There are many questions about ifelse, but I did not locate one that addressed this specific question: given that ifelse and if...else do not seem to work, what is the best solution? Sorry if this is a duplicate.
Here is my data set:
my.data2 <- read.table(text = '
refno v1 v2 state1 state2 xday first last
111 41 47 1 2 42 1 2
111 41 47 1 2 42 2 1
222 45 49 1 4 47 1 2
222 45 49 1 4 47 2 1
333 59 65 1 2 65 1 2
333 59 65 1 2 65 2 1
444 45 49 1 2 48 1 2
444 45 49 1 2 48 2 1
555 66 80 1 2 75 1 2
555 66 80 1 2 75 2 1
666 103 109 1 2 108 1 2
666 103 109 1 2 108 2 1
777 43 46 1 2 45 1 2
777 43 46 1 2 45 2 1
', header = TRUE, stringsAsFactors = FALSE)
Here are the desired vectors:
desired.data.v1b <- c(41,42, 45,47, 59,65, 45,48, 66,75, 103,108, 43,45)
desired.data.v2b <- c(42,47, 47,49, 65,65, 48,49, 75,80, 108,109, 45,46)
Here is where I start attempting to create these vectors:
v1b <- my.data2$v1
v2b <- my.data2$v2
# this ifelse does not work
my.data2$v1b < ifelse(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$last == 1, my.data2$xday, my.data2$v1)
my.data2$v2b < ifelse(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$first == 1, my.data2$xday, my.data2$v2)
# this if...else does not work
if(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$last == 1) {v1b = my.data2$xday} else {v1b = my.data2$v1}
if(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$first == 1) {v2b = my.data2$xday} else {v2b = my.data2$v2}
# this for-loop works
for(i in 1:nrow(my.data2)) {
if(my.data2$state1[i] == 1 & my.data2$state2[i] %in% c(2,4) & my.data2$last[i] == 1) {v1b[i] = my.data2$xday[i]}
if(my.data2$state1[i] == 1 & my.data2$state2[i] %in% c(2,4) & !(my.data2$last[i] == 1)) {v1b[i] = my.data2$v1[i] }
if(my.data2$state1[i] == 1 & my.data2$state2[i] %in% c(2,4) & my.data2$first[i] == 1) {v2b[i] = my.data2$xday[i]}
if(my.data2$state1[i] == 1 & my.data2$state2[i] %in% c(2,4) & !(my.data2$first[i] == 1)) {v2b[i] = my.data2$v2[i] }
}
all.equal(desired.data.v1b, v1b)
all.equal(desired.data.v2b, v2b)
my.data2$v1b <- v1b
my.data2$v2b <- v2b
# this apply works
my.v1 <- apply(my.data2, 1, function(x) {if (x['state1'] == 1 & x['state2'] %in% c(2,4) & x['last'] == 1) {x['v1b'] = x['xday']} else {x['v1b'] = x['v1']}})
my.v2 <- apply(my.data2, 1, function(x) {if (x['state1'] == 1 & x['state2'] %in% c(2,4) & x['first'] == 1) {x['v2b'] = x['xday']} else {x['v2b'] = x['v2']}})
names(my.v1) <- NULL
names(my.v2) <- NULL
all.equal(desired.data.v1b, my.v1)
all.equal(desired.data.v2b, my.v2)
EDIT
Maybe this is the canonical solution?
my.data2$v1b <- rep(-99, nrow(my.data2))
my.data2$v2b <- rep(-99, nrow(my.data2))
my.data2$v1b[(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$last == 1) ] <- my.data2$xday[(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$last == 1) ]
my.data2$v1b[(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & !(my.data2$last == 1))] <- my.data2$v1[ (my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & !(my.data2$last == 1))]
my.data2$v2b[(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$first == 1) ] <- my.data2$xday[(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & my.data2$first == 1) ]
my.data2$v2b[(my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & !(my.data2$first == 1))] <- my.data2$v2[ (my.data2$state1 == 1 & my.data2$state2 %in% c(2,4) & !(my.data2$first == 1))]
all.equal(desired.data.v1b, my.data2$v1b)
all.equal(desired.data.v2b, my.data2$v2b)
Related
How to fulfil two conditions in ifelse function in R
I have two columns one is gender and the other one a measure as below. I want to set cutoffs for male(gender = 1) and measure column. I want to say if it is male and measure is less that 23 then it is 1 otherwise 0 and if if it is female and measure is less that 15 then it is 1 otherwise 0. I tried below, but not not working. I appreciate your help. d$measure_status = ifelse(d$gender ==2 & d$measure<15, 1, ifelse( d$gender ==1 & d$measure<23, 1), 0) gender measure measure_status 2 14 2 17 1 25 1 26
You can use with and make it a single ifelse condition df = data.frame(gender = c(2,2,1,1), measure = c(14,17,25,26)) df$measure_status <- with(df, ifelse((df$gender ==2 & df$measure<15) | (df$gender ==1 & df$measure<23), 1 , 0)) df Output: gender measure measure_status 1 2 14 1 2 2 17 0 3 1 25 0 4 1 26 0 You can also use transform df = data.frame(gender = c(2,2,1,1), measure = c(14,17,25,26)) df <- transform(df, measure_status = ifelse((df$gender ==2 & df$measure<15) | (df$gender ==1 & df$measure<23), 1 , 0))
R - How to use sum and group_by inside apply?
I'm fairly new to R and I have the following issue. I have a dataframe like this: A | B | C | E | F |G 1 02 XXX XXX XXX 1 1 02 XXX XXX XXX 1 2 02 XXX XXX XXX NA 2 02 XXX XXX XXX NA 3 02 XXX XXX XXX 1 3 Z1 XXX XXX XXX 1 4 02 XXX XXX XXX 2 .... M 02 XXX XXX XXX 1 The thing is that the dataframe possibly has 150k rows or more, and I need to generate another dataframe grouping by A (which is an ID) and count the following occurrences: When B is 02 and G has 1 <- V When B is 02 and G is NA <- W When B is Z1 and G has 1 <- X When B is Z1 and G is NA <- Y Any other kind of occurrence <- Z For this simple example, the result should look something like this A | V | W | X | Y | Z 1 2 0 0 0 0 2 0 2 0 0 0 3 1 1 0 0 0 4 0 0 0 0 1 ... M 1 0 0 0 0 At this point I managed to get the results using a for loop: get_counters <- function(df){ counters <- data.frame(matrix(ncol = 6, nrow = length(unique(df$A)))) colnames(counters) <- c("A", "V", "W", "X", "Y", "Z") counters$A<- unique(df$A) for (i in 1:nrow(counters)) { counters$V[i] <- sum(df$A == counters$A[i] & df$B == "02" & df$G == 1, na.rm = TRUE) counters$W[i] <- sum(df$A == counters$A[i] & df$B == "02" & is.na(df$G), na.rm = TRUE) counters$X[i] <- sum(df$A == counters$A[i] & df$B == "Z1" & df$G== 1, na.rm = TRUE) counters$Y[i] <- sum(df$A == counters$A[i] & df$B == "Z1" & is.na(df$G), na.rm = TRUE) counters$Z[i] <- sum(df$A == counters$A[i] & (df$B == "Z1" | df$B == "02") & df$G!= 1, na.rm = TRUE) } return(counters) } Trying that on a small test dataframe returns all the correct results, but with the real data is extremely slow. I'm not sure how to use the apply functions, seems like a simple problem, but I have not found an answer. So far I've assumed that if I could use apply with the sum statement in my for loop (maybe using group_by(A)) I could do it, but I receive all kind of errors. counters$V <- df%>% group_by(A)%>% sum(df$A == counters$A& df$B == "02" &df$G == 1, na.rm = TRUE) Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables In addition: Warning message: In df$A== counters$A: longer object length is not a multiple of shorter object length If I change the function to not use a for loop and not use $ (I get an error referring to "$ operator is invalid for atomic vectors") I either get more errors or weird unreadable results (Large lists that contain more values that the original dataframe, huge empty matrices, etc...) Is there a simple (maybe not simple but fast and efficient) way to solve this problem? Thanks in advance.
You can do this very quickly using data.table. Creating Dummy Data: set.seed(123) counters <- data.frame(A = rep(1:100000, each = 3), B = sample(c("02","Z1"), size = 300000, replace = T), G = sample(c(1,NA), size = 300000, replace = T)) All I am doing is counting the instances of the combination, then reshaping the data in the format you need: library(data.table) setDT(counters) counters[,comb := paste0(B,"_",G)] dcast(counters, A ~ comb, fun.aggregate = length, value.var = "A") A 02_1 02_NA Z1_1 Z1_NA 1: 1 0 2 1 0 2: 2 1 0 1 1 3: 3 0 0 2 1 4: 4 1 1 0 1 5: 5 0 1 2 0 --- 99996: 99996 0 1 1 1 99997: 99997 0 2 1 0 99998: 99998 2 0 1 0 99999: 99999 1 0 1 1 100000: 100000 0 2 0 1 I adopted a naming convention that is a bit more extensible (the new columns indicate what combination you are counting), but if you want to override, replace the comb := line with four lines like the following: counters[B == "02" & is.na(G), comb := "V"] counters[B == "02" & !is.na(G), comb := "X"] .... But I think the above is a bit more flexible.
programming R ifelse conditions loop
Hello i need help with programming R. I have data.frame B with four column x<- c(1,2,1,2,1,2,1,2,1,2,1,2,.......etc.) y<-c(5,5,8,8,12,12,19,19,30,30,50,50,...etc.) z<- c(2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-30,2018-11-30,.......etc.) m<-c(0,1,1,0,1,1,0,1,0,1,0,1,...etc.) 2 milion rows and i need create next columns . Next columns should look as t<-c(0,1,0,0,0,0,0,1,0,1,0,1,....) code in cycle look like B$t[1]=ifelse(B$y[i]==B$y[i+1] & B$z[i]==B$z[i+1] & B$x[i]==2 & B$m[1]==1,1,0) for (i in 2:length(B$z)) { B$t[i]<-ifelse(B$y[i]==B$y[i-1] & B$z[i]==B$z[i-1] & B$x[i]==2 & B$m[i]==1 & B$m[i]!=B$m[i-1],1,0) } I do not want to use cycle- loop. I use basic package in R. And i have new one question when i have data.frame E x<- c(1,2,3,1,2,3,1,2,3,1,2,3,.......etc.) y<-c(5,5,5,8,8,8,12,12,12,,19,19,19,30,30,30,50,50,50,...etc.) z<- c(2018-11-08,2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-29,2018-11-30,2018-11-30,2018-11-30,.......etc.) m<-c(0,1,1,0,0,1,0,1,0,1,0,1,0,0,1...etc.) 2 milion rows and i need create next columns . Next columns should look as t<-c(0,1,0,0,1,....) code in cycle look like E$t[1]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[1]==2 & E$m[1]==1,1,0) E$t[2]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[2]==3 & E$m[2]==1,1,0) for (i in 3:length(E$y)) { E$t[i]<-ifelse(E$y[i]==E$y[i-2] & E$z[i]==E$z[i-2] & E$x[i]==3 & E$m[i]==1 & E$m[i-1]==0 & E$m[i-2]==0,1,0) } I do not want to use cycle- loop. I use basic package in R.
Here is a solution with base R: N <- nrow(B) B$t <- ifelse(B$y==c(NA, B$y[-N]) & B$z==c(NA, B$z[-N]) & B$x==2 & B$m==1 & B$m!=c(NA, B$m[-N]), 1, 0) Here is a solution with data.table: library("data.table") B <- data.table( x= c(1,2,1,2,1,2,1,2,1,2,1,2), y= c(5,5,8,8,12,12,19,19,30,30,50,50), z= c("2018-11-08", "2018-11-08", "2018-11-09", "2018-11-09", "2018-11-11", "2018-11-11", "2018-11-20", "2018-11-20", "2018-11-29", "2018-11-29", "2018-11-30", "2018-11-30"), m= c(0,1,1,0,1,1,0,1,0,1,0,1) ) B[, t := ifelse(y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]), 1, 0)] or (if logical is acceptable) B[, t := (y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]))] or using shift() B[, t := (y==shift(y) & z==shift(z) & x==2 & m==1 & m!=shift(m))]
With dplyr you can use if_else and lag: library(dplyr) dat %>% mutate(t = if_else( y == lag(y) & z == lag(z) & x == 2 & m == 1 & m != lag(m), 1, 0) ) # mutate lets you create a new variable in dat (named t here) # x y z m t # 1 1 5 2018-11-08 0 0 # 2 2 5 2018-11-08 1 1 # 3 1 8 2018-11-09 1 0 # 4 2 8 2018-11-09 0 0 # 5 1 12 2018-11-11 1 0 # 6 2 12 2018-11-11 1 0 # 7 1 19 2018-11-20 0 0 # 8 2 19 2018-11-20 1 1 # 9 1 30 2018-11-29 0 0 # 10 2 30 2018-11-29 1 1 # 11 1 50 2018-11-30 0 0 # 12 2 50 2018-11-30 1 1 Data: x<- c(1,2,1,2,1,2,1,2,1,2,1,2) y<-c(5,5,8,8,12,12,19,19,30,30,50,50) z<- c("2018-11-08","2018-11-08","2018-11-09","2018-11-09","2018-11-11","2018-11-11","2018-11-20","2018-11-20","2018-11-29","2018-11-29","2018-11-30","2018-11-30") m<-c(0,1,1,0,1,1,0,1,0,1,0,1) dat <- data.frame(x, y, z, m)
if else with multiple conditions combined with AND and OR
I am looking for a way to create a new variable (1,0) with 1 for multiple conditions combined with AND and OR. i.e. if a > 3 AND b > 5 OR c > 3 AND d > 5 OR e > 3 AND f > 5 1 if not 0 I've tried coding it as; df$newvar <- ifelse(df$a > 3 & df$b > 5 | df$c > 3 & df$d > 5 | df$e > 3 & df$f > 5,"1","0") But in my output many variables are coded as NA and the numbers do not seem to add up. Does anyone have advice on a proper way to code this?
We can subset the columns to evaluate for values greater than 3, get a list of logical vectors ('l1'), similarly for values greater than 5 ('l2'), then compare the corresponding elements of list using Map and Reduce it to a single vector. With as.integer, we coerce the logical vector to binary l1 <- lapply(df[c('a', 'c', 'e')] , function(x) x > 3 & !is.na(x)) l2 <- lapply(df[c('b', 'd', 'f')], function(x) x > 5 & !is.na(x)) df$newvar <- as.integer(Reduce(`|`, Map(`&`, l1, l2))) df$newvar #[1] 0 0 1 1 0 1 0 0 1 0 Or using the OP's method with(df, as.integer((a >3 & !is.na(a) & b > 5 & !is.na(b)) | (c > 3 & !is.na(c) & d > 5 & !is.na(d)) | (e > 3 & !is.na(e) & f > 5 & !is.na(f)))) #[1] 0 0 1 1 0 1 0 0 1 0 data set.seed(24) df <- as.data.frame(matrix(sample(c(NA, 1:8), 6 * 10, replace = TRUE), ncol = 6, dimnames = list(NULL, letters[1:6])))
Multiple subsets
Could you suggest a more elegant solution to the following problem? Remove rows containing more than one 0 in columns x,z,y or a,b,c. df <- data.frame(x = 0, y = 1:5, z = 0:4, a = 4:0, b = 1:5, c=0) my solution (row 1 and row 5 should get removed) df_new <- subset(df, ((((x != 0 & y != 0) | (x != 0 & z != 0) | (y != 0 & z != 0)) & ((a != 0 & b != 0) | (a != 0 & c != 0) | (b != 0 & c != 0)))))
# 1:3 is same as columns 'x', 'y', 'z', Similarily for 4:6 . # You can also specify the colnames explicitly # add a na.rm = T inside rowSums() incase you also have missing data (rowSums(df[, 1:3]==0)>1)|(rowSums(df[, 4:6]==0)>1) # did you mean this ? df[!((rowSums(df[, 1:3]==0)>1)|(rowSums(df[, 4:6]==0)>1)),] # x y z a b c #2 0 2 1 3 2 0 #3 0 3 2 2 3 0 #4 0 4 3 1 4 0