How to fulfil two conditions in ifelse function in R - r

I have two columns one is gender and the other one a measure as below. I want to set cutoffs for male(gender = 1) and measure column. I want to say if it is male and measure is less that 23 then it is 1 otherwise 0 and if if it is female and measure is less that 15 then it is 1 otherwise 0.
I tried below, but not not working. I appreciate your help.
d$measure_status = ifelse(d$gender ==2 & d$measure<15, 1, ifelse( d$gender ==1 & d$measure<23, 1), 0)
gender measure measure_status
2 14
2 17
1 25
1 26

You can use with and make it a single ifelse condition
df = data.frame(gender = c(2,2,1,1), measure = c(14,17,25,26))
df$measure_status <- with(df, ifelse((df$gender ==2 & df$measure<15) | (df$gender ==1 & df$measure<23), 1 , 0))
df
Output:
gender measure measure_status
1 2 14 1
2 2 17 0
3 1 25 0
4 1 26 0
You can also use transform
df = data.frame(gender = c(2,2,1,1), measure = c(14,17,25,26))
df <- transform(df, measure_status = ifelse((df$gender ==2 & df$measure<15) | (df$gender ==1 & df$measure<23), 1 , 0))

Related

Counting number of rows if certain conditions are met

Im sure someone has a smart solution for this problem:
I have a dataframe like so:
A <- c("name1", "name2", "name3", "name4", "name5", "name6")
B <- c(10, 8, 7, 3, -1, -2)
C <- c(8, 3, -1, -10, -2, -2)
df <- data.frame(A, B, C)
df
A B C
1 name1 10 8
2 name2 8 3
3 name3 7 -1
4 name4 3 -10
5 name5 -1 -2
6 name6 -2 -2
I want to obtain four values, by counting the rows if certain conditions are met:
I want to count the number of rows in this dataframe where both B and C are negative integers (>0) -- for this example that would be "2"
I want to count the number of rows in this dataframe where both B and C are positive integers (<0)-- for this example that would be "2"
I want to count the number of rows in this dataframe where B is a negative integer (>0) and C is positive -- for this example that would be "0"
I want to count the number of rows in this dataframe where B is a postive integer and C is negative) -- for this example that would be "2"
Im suspecting that this can be achieved with some sort of If/Else statement, combined with the "table(sign..." command?
Try this:
library(dplyr)
df_count <- df %>% summarise(con1 = sum(B < 0 & C < 0),
con2 = sum(B > 0 & C > 0),
con3 = sum(B < 0 & C > 0),
con4 = sum(B > 0 & C < 0))
df_count
con1 con2 con3 con4
2 2 0 2
We can use count after creating a column with interaction on the sign
library(dplyr)
df %>%
transmute(con = factor(interaction(sign(B), sign(C), sep=" "),
levels = c('1 1', '1 -1', '-1 1', '-1 -1'))) %>%
count(con, .drop = FALSE)
# con n
#1 1 1 2
#2 1 -1 2
#3 -1 1 0
#4 -1 -1 2

Changing values of column based on whether another column satisfy a criteria

I want to subtract 1 from the values of column A if column B is <= 20.
A = c(1,2,3,4,5)
B = c(10,20,30,40,50)
df = data.frame(A,B)
output
A B
1 0 10
2 1 20
3 3 30
4 4 40
5 5 50
My data is very huge so I prefer not to use a loop. Is there any computationally efficient method in R?
You can do
df$A[df$B <= 20] <- df$A[df$B <= 20] - 1
# A B
#1 0 10
#2 1 20
#3 3 30
#4 4 40
#5 5 50
We can break this down step-by-step to understand how this works.
First we check which numbers in B is less than equal to 20 which gives us a logical vector
df$B <= 20
#[1] TRUE TRUE FALSE FALSE FALSE
Using that logical vector we can select the numbers in A
df$A[df$B <= 20]
#[1] 1 2
Subtract 1 from those numbers
df$A[df$B <= 20] - 1
#[1] 0 1
and replace these values for the same indices in A.
With dplyr we can also use case_when
library(dplyr)
df %>%
mutate(A = case_when(B <= 20 ~ A - 1,
TRUE ~ A))
Another possibility:
df$A <- ifelse(df$B < 21, df$A - 1, df$A)
And here is a data.table solution:
library(data.table)
setDT(df)
df[B <= 20, A := A - 1]

R - How to use sum and group_by inside apply?

I'm fairly new to R and I have the following issue.
I have a dataframe like this:
A | B | C | E | F |G
1 02 XXX XXX XXX 1
1 02 XXX XXX XXX 1
2 02 XXX XXX XXX NA
2 02 XXX XXX XXX NA
3 02 XXX XXX XXX 1
3 Z1 XXX XXX XXX 1
4 02 XXX XXX XXX 2
....
M 02 XXX XXX XXX 1
The thing is that the dataframe possibly has 150k rows or more, and I need to generate another dataframe grouping by A (which is an ID) and count the following occurrences:
When B is 02 and G has 1 <- V
When B is 02 and G is NA <- W
When B is Z1 and G has 1 <- X
When B is Z1 and G is NA <- Y
Any other kind of occurrence <- Z
For this simple example, the result should look something like this
A | V | W | X | Y | Z
1 2 0 0 0 0
2 0 2 0 0 0
3 1 1 0 0 0
4 0 0 0 0 1
...
M 1 0 0 0 0
At this point I managed to get the results using a for loop:
get_counters <- function(df){
counters <- data.frame(matrix(ncol = 6, nrow = length(unique(df$A))))
colnames(counters) <- c("A", "V", "W", "X", "Y", "Z")
counters$A<- unique(df$A)
for (i in 1:nrow(counters)) {
counters$V[i] <- sum(df$A == counters$A[i] & df$B == "02" & df$G == 1, na.rm = TRUE)
counters$W[i] <- sum(df$A == counters$A[i] & df$B == "02" & is.na(df$G), na.rm = TRUE)
counters$X[i] <- sum(df$A == counters$A[i] & df$B == "Z1" & df$G== 1, na.rm = TRUE)
counters$Y[i] <- sum(df$A == counters$A[i] & df$B == "Z1" & is.na(df$G), na.rm = TRUE)
counters$Z[i] <- sum(df$A == counters$A[i] & (df$B == "Z1" | df$B == "02") & df$G!= 1, na.rm = TRUE)
}
return(counters)
}
Trying that on a small test dataframe returns all the correct results, but with the real data is extremely slow. I'm not sure how to use the apply functions, seems like a simple problem, but I have not found an answer. So far I've assumed that if I could use apply with the sum statement in my for loop (maybe using group_by(A)) I could do it, but I receive all kind of errors.
counters$V <- df%>%
group_by(A)%>%
sum(df$A == counters$A& df$B == "02" &df$G == 1, na.rm = TRUE)
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
In addition: Warning message:
In df$A== counters$A:
longer object length is not a multiple of shorter object length
If I change the function to not use a for loop and not use $ (I get an error referring to "$ operator is invalid for atomic vectors") I either get more errors or weird unreadable results (Large lists that contain more values that the original dataframe, huge empty matrices, etc...)
Is there a simple (maybe not simple but fast and efficient) way to solve this problem? Thanks in advance.
You can do this very quickly using data.table.
Creating Dummy Data:
set.seed(123)
counters <- data.frame(A = rep(1:100000, each = 3), B = sample(c("02","Z1"), size = 300000, replace = T), G = sample(c(1,NA), size = 300000, replace = T))
All I am doing is counting the instances of the combination, then reshaping the data in the format you need:
library(data.table)
setDT(counters)
counters[,comb := paste0(B,"_",G)]
dcast(counters, A ~ comb, fun.aggregate = length, value.var = "A")
A 02_1 02_NA Z1_1 Z1_NA
1: 1 0 2 1 0
2: 2 1 0 1 1
3: 3 0 0 2 1
4: 4 1 1 0 1
5: 5 0 1 2 0
---
99996: 99996 0 1 1 1
99997: 99997 0 2 1 0
99998: 99998 2 0 1 0
99999: 99999 1 0 1 1
100000: 100000 0 2 0 1
I adopted a naming convention that is a bit more extensible (the new columns indicate what combination you are counting), but if you want to override, replace the comb := line with four lines like the following:
counters[B == "02" & is.na(G), comb := "V"]
counters[B == "02" & !is.na(G), comb := "X"]
....
But I think the above is a bit more flexible.

Changing values in one column based on another in R

So I am using R and trying to change values in a data frame in one column by comparing two columns together. I have something like
Median MyPrice
10 0
20 18
20 20
30 35
15 NA
And I would like to say something like
if(MyPrice == 0 & MyPrice < Median){MyPrice <- 1
}else if (MyPrice == Median){MyPrice <- 2
}else if (MyPrice > Median){MyPrice <- 3
}else {MyPrice <- 4}
To come up with
Median MyPrice
10 1
20 1
20 2
30 3
15 4
But there is always an error. I have also tried something like
for(i in MyPrice){if(MyPrice == 0 & MyPrice < Median){MyPrice <- 1
}else if (MyPrice == Median){MyPrice <- 2
}else if (MyPrice > Median){MyPrice <- 3
}else {MyPrice <- 4}
}
The for loop runs but it changes all values in MyPrice to 4. I also tried the ifelse() function but it seemed to have an issue taking that many arguments at once.
I would also not be opposed to a new column being added to the end of the data frame if a solution like that is easier.
You don't necessarily have to use a for loop. Start by setting every comparison to 4.
> x$Comp=4
> x$Comp[x$Median>x$MyPrice]=1 #if Median is higher, comparison = 1
> x$Comp[x$Median==x$MyPrice]=2 #if Median is equal to MyPrice, comparison = 2
> x$Comp[x$Median<x$MyPrice]=3 #if Median is lower, comparison = 3
> x
Median MyPrice Comp
1 10 0 1
2 20 18 1
3 20 20 2
4 30 35 3
5 15 NA 4
Given your first argument that if MyPrice == 0 & MyPrice < Median, your 2nd row where Median: 20 and MyPrice: 18 should also be 4. Here is a working nested ifelse statement with an NA handler after.
df <- as.data.frame(matrix(c(10,0,20,18,20,20,30,35,15,NA), byrow = T, ncol = 2))
colnames(df) <- c("Median","MyPrice")
df$NewPrice <- ifelse(df$MyPrice == 0 & df$MyPrice < df$Median, 1,
ifelse(df$MyPrice == df$Median, 2,
ifelse(df$MyPrice > df$Median, 3, 4)))
df$NewPrice[is.na(df$MyPrice)] <- 4
df
# Median MyPrice NewPrice
#1 10 0 1
#2 20 18 4
#3 20 20 2
#4 30 35 3
#5 15 NA 4
What about setting a new variable with all values in 4 and then, replace those cases where your conditions apply?
Simple, straight forward and easy to read :-)
#(Following the example from #Evans Friedland)
df <- as.data.frame(matrix(c(10,0,20,18,20,20,30,35,15,NA), byrow = T, ncol = 2))
colnames(df) <- c("Median","MyPrice")
df <- mutate(df, myNewPrice = 4) #set my new price to 4, then edit by following your conditions
df$myNewPrice<- replace (df$myNewPrice, df$MyPrice == 0 & df$MyPrice < df$Median, 1)
df$myNewPrice<- replace (df$myNewPrice, df$MyPrice == df$Median , 2)
df$myNewPrice<- replace (df$myNewPrice, df$MyPrice > df$Median , 3)
df$myNewPrice <- as.numeric (df$myNewPrice) #might, might not be needed.

R handling NA values while doing a comparison ifelse [duplicate]

This question already has answers here:
How to ignore NA in ifelse statement
(5 answers)
Closed 8 years ago.
how can i tell > and < operators to ignore NA values? Below code return NA on 1st row. I want it to return 0 as both conditions fail on that row
##sum by values
df <- data.frame(sex=c('M','F','M'),occupation=c('Student','Analyst','Analyst'),age=c(NA,6,9), marks=c(34,65,21))
df
#df$counting <- ifelse(df$age > 5 & df$age < 8, 1, 0)
df$counting <- ifelse(df$age > 5 & df$age < 8, 1, 0)+ifelse(df$marks > 60 & df$marks < 70, 1, 0)
df
Please see the following SO post: How to ignore NA in ifelse statement
With respect to your question:
df$counting <- ifelse(df$age > 5 & df$age < 8 & !is.na(df$age), 1, 0) + ifelse(df$marks > 60 & df$marks < 70, 1, 0)
> df
sex occupation age marks counting
1 M Student NA 34 0
2 F Analyst 6 65 2
3 M Analyst 9 21 0
you could also use cut or findInterval.
df$counting <- colSums(rbind(cut(df$age, c(5,8), labels=F),cut(df$marks, c(60,70), labels=F)), na.rm=T)
df
# sex occupation age marks counting
#1 M Student NA 34 0
#2 F Analyst 6 65 2
#3 M Analyst 9 21 0

Resources