if else with multiple conditions combined with AND and OR - r

I am looking for a way to create a new variable (1,0) with 1 for multiple conditions combined with AND and OR.
i.e. if
a > 3 AND b > 5
OR
c > 3 AND d > 5
OR
e > 3 AND f > 5
1
if not
0
I've tried coding it as;
df$newvar <- ifelse(df$a > 3 & df$b > 5 | df$c > 3 & df$d > 5 | df$e > 3 & df$f > 5,"1","0")
But in my output many variables are coded as NA and the numbers do not seem to add up.
Does anyone have advice on a proper way to code this?

We can subset the columns to evaluate for values greater than 3, get a list of logical vectors ('l1'), similarly for values greater than 5 ('l2'), then compare the corresponding elements of list using Map and Reduce it to a single vector. With as.integer, we coerce the logical vector to binary
l1 <- lapply(df[c('a', 'c', 'e')] , function(x) x > 3 & !is.na(x))
l2 <- lapply(df[c('b', 'd', 'f')], function(x) x > 5 & !is.na(x))
df$newvar <- as.integer(Reduce(`|`, Map(`&`, l1, l2)))
df$newvar
#[1] 0 0 1 1 0 1 0 0 1 0
Or using the OP's method
with(df, as.integer((a >3 & !is.na(a) & b > 5 & !is.na(b)) | (c > 3 & !is.na(c) &
d > 5 & !is.na(d)) | (e > 3 & !is.na(e) & f > 5 & !is.na(f))))
#[1] 0 0 1 1 0 1 0 0 1 0
data
set.seed(24)
df <- as.data.frame(matrix(sample(c(NA, 1:8), 6 * 10, replace = TRUE),
ncol = 6, dimnames = list(NULL, letters[1:6])))

Related

How to ignore NA when selecting values from a vector in R

Say I have
a <- c(0:3, NA)
and I wish to replace 0 with 1 and replace 1 with 0. Namely, I want a <- c(1, 0, 2, 3, NA). The following codes do not work because of NA
> a[a<2] <- 1- a[a<2]
Error in a[a < 2] <- 1 - a[a < 2] :
NAs are not allowed in subscripted assignments
I know we can use na.rm = T if we are using a function. How to add such argument in my case?
Just subset the values 0 and 1 using %in% and subtract from 1 - %in% returns FALSE for NA values and only TRUE for the values that match the rhs, then we subtract 1 so that 1-1 = 0 and and 1-0 = 1
a[a %in% 0:1] <- 1 - a[a %in% 0:1]
-output
> a
[1] 1 0 2 3 NA
Or if we want to use the OP's code
a[a < 2 & !is.na(a)] <- 1 - a[a < 2 & !is.na(a)]

R - How to use sum and group_by inside apply?

I'm fairly new to R and I have the following issue.
I have a dataframe like this:
A | B | C | E | F |G
1 02 XXX XXX XXX 1
1 02 XXX XXX XXX 1
2 02 XXX XXX XXX NA
2 02 XXX XXX XXX NA
3 02 XXX XXX XXX 1
3 Z1 XXX XXX XXX 1
4 02 XXX XXX XXX 2
....
M 02 XXX XXX XXX 1
The thing is that the dataframe possibly has 150k rows or more, and I need to generate another dataframe grouping by A (which is an ID) and count the following occurrences:
When B is 02 and G has 1 <- V
When B is 02 and G is NA <- W
When B is Z1 and G has 1 <- X
When B is Z1 and G is NA <- Y
Any other kind of occurrence <- Z
For this simple example, the result should look something like this
A | V | W | X | Y | Z
1 2 0 0 0 0
2 0 2 0 0 0
3 1 1 0 0 0
4 0 0 0 0 1
...
M 1 0 0 0 0
At this point I managed to get the results using a for loop:
get_counters <- function(df){
counters <- data.frame(matrix(ncol = 6, nrow = length(unique(df$A))))
colnames(counters) <- c("A", "V", "W", "X", "Y", "Z")
counters$A<- unique(df$A)
for (i in 1:nrow(counters)) {
counters$V[i] <- sum(df$A == counters$A[i] & df$B == "02" & df$G == 1, na.rm = TRUE)
counters$W[i] <- sum(df$A == counters$A[i] & df$B == "02" & is.na(df$G), na.rm = TRUE)
counters$X[i] <- sum(df$A == counters$A[i] & df$B == "Z1" & df$G== 1, na.rm = TRUE)
counters$Y[i] <- sum(df$A == counters$A[i] & df$B == "Z1" & is.na(df$G), na.rm = TRUE)
counters$Z[i] <- sum(df$A == counters$A[i] & (df$B == "Z1" | df$B == "02") & df$G!= 1, na.rm = TRUE)
}
return(counters)
}
Trying that on a small test dataframe returns all the correct results, but with the real data is extremely slow. I'm not sure how to use the apply functions, seems like a simple problem, but I have not found an answer. So far I've assumed that if I could use apply with the sum statement in my for loop (maybe using group_by(A)) I could do it, but I receive all kind of errors.
counters$V <- df%>%
group_by(A)%>%
sum(df$A == counters$A& df$B == "02" &df$G == 1, na.rm = TRUE)
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
In addition: Warning message:
In df$A== counters$A:
longer object length is not a multiple of shorter object length
If I change the function to not use a for loop and not use $ (I get an error referring to "$ operator is invalid for atomic vectors") I either get more errors or weird unreadable results (Large lists that contain more values that the original dataframe, huge empty matrices, etc...)
Is there a simple (maybe not simple but fast and efficient) way to solve this problem? Thanks in advance.
You can do this very quickly using data.table.
Creating Dummy Data:
set.seed(123)
counters <- data.frame(A = rep(1:100000, each = 3), B = sample(c("02","Z1"), size = 300000, replace = T), G = sample(c(1,NA), size = 300000, replace = T))
All I am doing is counting the instances of the combination, then reshaping the data in the format you need:
library(data.table)
setDT(counters)
counters[,comb := paste0(B,"_",G)]
dcast(counters, A ~ comb, fun.aggregate = length, value.var = "A")
A 02_1 02_NA Z1_1 Z1_NA
1: 1 0 2 1 0
2: 2 1 0 1 1
3: 3 0 0 2 1
4: 4 1 1 0 1
5: 5 0 1 2 0
---
99996: 99996 0 1 1 1
99997: 99997 0 2 1 0
99998: 99998 2 0 1 0
99999: 99999 1 0 1 1
100000: 100000 0 2 0 1
I adopted a naming convention that is a bit more extensible (the new columns indicate what combination you are counting), but if you want to override, replace the comb := line with four lines like the following:
counters[B == "02" & is.na(G), comb := "V"]
counters[B == "02" & !is.na(G), comb := "X"]
....
But I think the above is a bit more flexible.

programming R ifelse conditions loop

Hello i need help with programming R. I have data.frame B with four column
x<- c(1,2,1,2,1,2,1,2,1,2,1,2,.......etc.)
y<-c(5,5,8,8,12,12,19,19,30,30,50,50,...etc.)
z<- c(2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-30,2018-11-30,.......etc.)
m<-c(0,1,1,0,1,1,0,1,0,1,0,1,...etc.)
2 milion rows and i need create next columns . Next columns should look as
t<-c(0,1,0,0,0,0,0,1,0,1,0,1,....)
code in cycle look like
B$t[1]=ifelse(B$y[i]==B$y[i+1] & B$z[i]==B$z[i+1] & B$x[i]==2 & B$m[1]==1,1,0)
for (i in 2:length(B$z))
{
B$t[i]<-ifelse(B$y[i]==B$y[i-1] & B$z[i]==B$z[i-1] & B$x[i]==2 & B$m[i]==1 & B$m[i]!=B$m[i-1],1,0)
}
I do not want to use cycle- loop.
I use basic package in R.
And i have new one question when i have data.frame E
x<- c(1,2,3,1,2,3,1,2,3,1,2,3,.......etc.)
y<-c(5,5,5,8,8,8,12,12,12,,19,19,19,30,30,30,50,50,50,...etc.)
z<- c(2018-11-08,2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-29,2018-11-30,2018-11-30,2018-11-30,.......etc.)
m<-c(0,1,1,0,0,1,0,1,0,1,0,1,0,0,1...etc.)
2 milion rows and i need create next columns . Next columns should look as
t<-c(0,1,0,0,1,....)
code in cycle look like
E$t[1]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[1]==2 & E$m[1]==1,1,0)
E$t[2]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[2]==3 & E$m[2]==1,1,0)
for (i in 3:length(E$y))
{
E$t[i]<-ifelse(E$y[i]==E$y[i-2] & E$z[i]==E$z[i-2] & E$x[i]==3 & E$m[i]==1 &
E$m[i-1]==0 & E$m[i-2]==0,1,0)
}
I do not want to use cycle- loop.
I use basic package in R.
Here is a solution with base R:
N <- nrow(B)
B$t <- ifelse(B$y==c(NA, B$y[-N]) & B$z==c(NA, B$z[-N]) & B$x==2 & B$m==1 & B$m!=c(NA, B$m[-N]), 1, 0)
Here is a solution with data.table:
library("data.table")
B <- data.table(
x= c(1,2,1,2,1,2,1,2,1,2,1,2), y= c(5,5,8,8,12,12,19,19,30,30,50,50),
z= c("2018-11-08", "2018-11-08", "2018-11-09", "2018-11-09", "2018-11-11", "2018-11-11", "2018-11-20",
"2018-11-20", "2018-11-29", "2018-11-29", "2018-11-30", "2018-11-30"),
m= c(0,1,1,0,1,1,0,1,0,1,0,1)
)
B[, t := ifelse(y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]), 1, 0)]
or (if logical is acceptable)
B[, t := (y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]))]
or using shift()
B[, t := (y==shift(y) & z==shift(z) & x==2 & m==1 & m!=shift(m))]
With dplyr you can use if_else and lag:
library(dplyr)
dat %>%
mutate(t = if_else(
y == lag(y) & z == lag(z) & x == 2 & m == 1 & m != lag(m), 1, 0)
) # mutate lets you create a new variable in dat (named t here)
# x y z m t
# 1 1 5 2018-11-08 0 0
# 2 2 5 2018-11-08 1 1
# 3 1 8 2018-11-09 1 0
# 4 2 8 2018-11-09 0 0
# 5 1 12 2018-11-11 1 0
# 6 2 12 2018-11-11 1 0
# 7 1 19 2018-11-20 0 0
# 8 2 19 2018-11-20 1 1
# 9 1 30 2018-11-29 0 0
# 10 2 30 2018-11-29 1 1
# 11 1 50 2018-11-30 0 0
# 12 2 50 2018-11-30 1 1
Data:
x<- c(1,2,1,2,1,2,1,2,1,2,1,2)
y<-c(5,5,8,8,12,12,19,19,30,30,50,50)
z<- c("2018-11-08","2018-11-08","2018-11-09","2018-11-09","2018-11-11","2018-11-11","2018-11-20","2018-11-20","2018-11-29","2018-11-29","2018-11-30","2018-11-30")
m<-c(0,1,1,0,1,1,0,1,0,1,0,1)
dat <- data.frame(x, y, z, m)

How to compute in a binary matrix in R

Here's my problem I couldn't solve it all.
Suppose that we have the following code as follows:
## A data frame named a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
## 1st function calculates all the combinaisons of colnames of a and the output is a character vector named item2
items2 <- c()
countI <- 1
while(countI <= ncol(a)){
for(i in countI){
countJ <- countI + 1
while(countJ <= ncol(a)){
for(j in countJ){
items2 <- c(items2, paste(colnames(a[i]), colnames(a[j]), collapse = '', sep = ""))
}
countJ <- countJ + 1
}
countI <- countI + 1
}
}
And here's my code I'm trying to solve (the output is a numeric vector called count_1):
## 2nd function
colnames(a) <- NULL ## just for facilitating the calculation
count_1 <- numeric(ncol(a)*2)
countI <- 1
while(countI <= ncol(a)){
for(i in countI){
countJ <- countI + 1
while(countJ <= ncol(a)){
for(j in countJ){
s <- a[, i]
p <- a[, j]
count_1[i*2] <- as.integer(s[i] == p[j] & s[i] == 1)
}
countJ <- countJ + 1
}
countI <- countI + 1
}
}
But when I execute this code in RStudio Console, a non-expectation result returned!:
count_1
[1] 0 0 0 0 0 1 0 1 0 0
However, I am expecting the following result:
count_1
[1] 1 2 2 2 1 1 1 1 2 1
You can see visit the following URL where you can find an image on Dropbox for detailed explanation.
https://www.dropbox.com/s/5ylt8h8wx3zrvy7/IMAG1074.jpg?dl=0
I'll try to explain a little more,
I posted the 1st function (code) just to show you what I'm looking for exactly that is an example that's all.
What I'm trying to get from the second function (code) is calculating the number of occurrences of number 1 (firstly we put counter = 0) in each row (while each row of two columns (AB, for example) must equal to one in both columns to say that counter = counter + 1) we continue by combing each column by all other columns (with AC, AD, AE, BC, BD, BE, CD, CE, and then DE), combination is n!/2!(n-2)!, that means for example if I have the following data frame:
a =
A B C D E
0 1 0 0 0
0 0 0 0 1
1 1 1 1 1
1 0 0 1 0
1 0 1 0 1
Then, the number of occurrences of the number 1 for each row by combining the two first columns is as follows: (Note that I put colnames(a) <- NULL just to facilitate the work and be more clear)
0 1 0 0 0
0 0 0 0 1
1 1 1 1 1
1 0 0 1 0
1 0 1 0 1
### Example 1: #####################################################
so from here I put (for columns A and B (AB))
s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 1 0 1 0 0
Then I'll look for the occurrence of the number 1 in both vectors in condition it must be the same, i.e. a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j], and for this example a numeric vector will be [1] 1
### Example 2: #####################################################
From here I put (for columns A and D (AD))
s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 0 0 1 1 0
Then I'll look for the occurrence of the number 1 in both vectors in condition it must be the same, i.e. a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j], and for this example a numeric vector will be [1] 2
And so on,
I'll have a numeric vector named count_1 equal to:
[1] 1 2 2 2 1 1 1 1 2 1
while each index of count_1 is a combination of each column by others (without the names of the data frame)
AB AC AD AE BC BD BE CD CE DE
1 2 2 2 1 1 1 1 2 1
Not clear what you're after at all.
As to the first code chunk, that is some ugly R coding involving a whole bunch of unnecessary while/for loops.
You can get the same result items2 in one single line.
items2 <- sort(toupper(unlist(sapply(1:4, function(i)
sapply(5:(i+1), function(j)
paste(letters[i], letters[j], sep = ""))))));
items2;
# [1] "AB" "AC" "AD" "AE" "BC" "BD" "BE" "CD" "CE" "DE"
As to the second code chunk, please explain what you're trying to calculate. It's likely that these while/for loops are as unnecessary as in the first case.
Update
Note that this is based on a as defined at the beginning of your post. Your expected output is based on a different a, that you changed further down the post.
There is no need for a for/while loop, both "functions" can be written in two one-liners.
# Your sample dataframe a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
# Function 1
items2 <- toupper(unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
paste(letters[i], letters[j], sep = "")))));
# Function 2
count_1 <- unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
sum(a[, i] + a[, j] == 2))));
# Add names and sort
names(count_1) <- items2;
count_1 <- count_1[order(names(count_1))];
# Output
count_1;
#AB AC AD AE BC BD BE CD CE DE
# 1 2 2 2 1 1 1 2 1 1

How to use as.numeric to transform a vector of random numbers into a vector of 0,1,2?

Let's say I have a vector $(0,1,2,3,4,5)$.
I want to transform it into the following: if the value in the original vector is:
$=0 \rightarrow 0$
$> 0$ but $<5 \rightarrow 1$
$=5 \rightarrow 2$
I tried:
v <- c(0,1,2,3,4,5)
v <- as.numeric(v=0, v>0 & v<5, v=5)
You can use two logical operations and add the results:
v2 <- (v > 0) + (v >= 5)
# [1] 0 1 1 1 1 2
v <- c(0,1,2,3,4,5)
v[v>0 & v<5] <- 1
v
#[1] 0 1 1 1 1 5
v[v == 5] <- 2
v
[1] 0 1 1 1 1 2
You could also try:
> vs <- as.numeric(ifelse(v==0,0,ifelse(v>0 & v<5,1,2)))
> vs
[1] 0 1 1 1 1 2

Resources