Making new variable through mutate - r

I want to make a new variable "churned" by taking into account five variables :
Include in churn
A-Churn
B-Churn
C-Churn
D-Churn
My condition is - If variable "Include in churn" has 1 and for all other variables , if any one of the variables has 1 than my new variable "Churned" should have 1 else 0. I am a newbie in using mutate function.
Please help me to create this new variable thru 'mutate' function.

If I understand your formulation logically, you want
mutate(data, Churned = Include.in.Churn == 1 & (A.Churn == 1 | B.Churn == 1 | C.Churn == 1 | D.Churn == 1))
This will make Churned a logical. If you really need an integer, as.integer will produce 1 for TRUE and 0 for FALSE.
If all mentioned Variables are either 1 or 0 you can also use the possibly faster
mutate(data, Churned = Include.in.Churn * (A.Churn + B.Churn + C.Churn + D.Churn) >= 1)

Related

How can I solve this error when using case_when?

I'm using this code:
ovabonnement <- ovabonnement %>%
mutate(c12_ovabonnement_type_con_voor = case_when(s2_ovabonnement_type_voor_anders == 1 ~ NA,
s2_ovabonnement_type_voor_1 == 1 |
s2_ovabonnement_type_voor_13 == 1 ~ "Basis",
s2_ovabonnement_type_voor_2 == 1 |
s2_ovabonnement_type_voor_3 == 1 |
s2_ovabonnement_type_voor_4 == 1 |
s2_ovabonnement_type_voor_9 == 1 |
s2_ovabonnement_type_voor_11 == 1 ~ "Voordeel",
s2_ovabonnement_type_voor_5 == 1 |
s2_ovabonnement_type_voor_6 == 1 |
s2_ovabonnement_type_voor_7 == 1 |
s2_ovabonnement_type_voor_8 == 1 |
s2_ovabonnement_type_voor_10 == 1 |
s2_ovabonnement_type_voor_12 == 1 |
s2_ovabonnement_type_voor_14 == 1 ~ "Vrij"))
So I have these 15 variables that represent whether a person has that subscription added onto their public transport membership. Because it was a multiple choice questionnaire people could select multiple choices, which is why they are different variables.
I want to make these into one variable that takes NA if people answered "other", "Basis" if people answered 1 or 13, "Voordeel" if people answered 2,3,4,9 or 11 and "Vrij" if people answered 5,6,7,8,10,12 or 14.
If people answered 2, there will be a 1 in s2_ovabonnement_type_voor_2. People can have answered multiple of these, which makes it a bit tricky. However, I want it to go through these chronologically. For example, if a person answered 2 AND 10, it should choose the 10, because the code is later, but I'm not sure if that is how case_when works.
I get this error:
Error in `mutate()`:
! Problem while computing `c12_ovabonnement_type_con_voor = case_when(...)`.
Caused by error in `names(message) <- `*vtmp*``:
! 'names' attribute [1] must be the same length as the vector [0]
Run `rlang::last_error()` to see where the error occurred.
case_when/if_else are type sensitive i.e all the expressions should return the same type. In the OP's expression, the first expression returns NA and NA by default is logical, and all others return character type. We need NA_character_ to match the type of others
ovabonnement <- ovabonnement %>%
mutate(c12_ovabonnement_type_con_voor = case_when(s2_ovabonnement_type_voor_anders == 1 ~ NA_character_,
s2_ovabonnement_type_voor_1 == 1 |
s2_ovabonnement_type_voor_13 == 1 ~ "Basis",
s2_ovabonnement_type_voor_2 == 1 |
s2_ovabonnement_type_voor_3 == 1 |
s2_ovabonnement_type_voor_4 == 1 |
s2_ovabonnement_type_voor_9 == 1 |
s2_ovabonnement_type_voor_11 == 1 ~ "Voordeel",
s2_ovabonnement_type_voor_5 == 1 |
s2_ovabonnement_type_voor_6 == 1 |
s2_ovabonnement_type_voor_7 == 1 |
s2_ovabonnement_type_voor_8 == 1 |
s2_ovabonnement_type_voor_10 == 1 |
s2_ovabonnement_type_voor_12 == 1 |
s2_ovabonnement_type_voor_14 == 1 ~ "Vrij"))

If condition is not showing the result

I am running below code, its working but not showing me output
for (name in tita$name){
if (tita$sex == 'female' && tita$embarked == 'S' && tita$age > 33.00)
{
print (name)
}
}
It's just showing me ****** in R studio, though when I check dataset, it has data which have female having age greater than 33 and embarked from S, but this statement is not showing me result. But when I change the value from 33 to 28 the same code shows me the result. Why is that.
I am using the following dataset:
https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3.csv
I think you're mixing loops and vectorization where you shouldn't. As I mentioned in the comments your conditions are vectorized, but it looks like you're trying to evaluate each element in a loop.
You should do either:
# loop through elements
for (i in seq_along(tita$name)){
if (tita$sex[i] == 'female' & tita$embarked[i] == 'S' & tita$age[i] > 33.00){
print(tita$name[i])
}
}
OR use vectorization (this will be faster and is recommended):
conditions <- tita$sex == 'female' & tita$embarked == 'S' & tita$age > 33.00
names <- tita$name[conditions]
Here conditions is a TRUE and FALSE logical vector -- TRUE where all the conditions are met. We can use the to subset in R. For more information on what I mean by vectorization please see this link.

Error in if statement with NA in data in R [duplicate]

This question already has answers here:
Error in if/while (condition) {: missing Value where TRUE/FALSE needed
(4 answers)
Closed 1 year ago.
I have a problem with a in if statement. I get an error message saying "absent value where TRUE / FALSE is required". I am trying to calculate a new variable using an if statement and a for cycle, but the data has NA values and the cycle I used cannot work any further after finding a NA value.
This is the variables I am using to create the new variable:
x=c(3,3,3,2,NA,2,3,NA,3,NA)
y=c(3,6,5,4,NA,3,2,NA,3,NA)
h=c(1,2,1.6666667,2,NA,1.5,0.6666667,NA,1,NA)
This the code I am using that has the problem with NA value:
z=rep(NA,length(y))
for(i in 1:length(x)){
if((x[i]==0 & y[i]>=3) | h[i]>=3){
z[i]=1
} else if((x==0 & y[i]<3) | h[i]<3){
z[i]=0
}
}
Can you tell me how could I include the NA values into the if statement or what should I do?
Thanks for your reply.
We can make changes based on the NA by inserting is.na
for(i in 1:length(x)){
if((x[i] %in% 0 & y[i]>=3 & !is.na(y[i])) | h[i]>=3 & !is.na(h[i])){
z[i]=1
} else if((x[i] %in% 0 & y[i]<3 & !is.na(y[i])) | h[i]<3 & !is.na(h[i])){
z[i]=0
}
}
You can check with !is.na(). Also this operation is vectorized so you don't need for loop.
inds <- x == 0 & y >= 3 | h >= 3
as.integer(inds & !is.na(inds))
#[1] 0 0 0 0 0 0 0 0 0 0
None of the value match the condition here.

if else multiple conditions comparing rows

I am strugling with this loop. I want to get "6" in the second row of column "Newcolumn".I get the following error.
Error in if (mydata$type_name[i] == "a" && mydata$type_name[i - :
missing value where TRUE/FALSE needed.
The code that I created:
id type_name name score newcolumn
1 a Car 2 2
1 a van 2 6
1 b Car 2 2
1 b Car 2 2
mydata$newcolumn <-c(0)
for (i in 1:length(mydata$id)){
if ((mydata$type_name [i] == "a") && (mydata$type_name[i-1] == "a") && ((mydata$name[i]) != (mydata$name[i-1]))){
mydata$newcolumn[i]=mydata$score[i]*3 }
else {
mydata$newcolumn[i]=mydata$score[i]*1
}
}
Thank you very much in advance
List starts at index 1 in R but like you are doing a i-1 in your loop starting at 1, your list is out of range (i-1=0) so your code can not return a True or False.

Using condition in columns of data frame to generate a vector in R

I have the following array:
Year Month Day Hour
1 1 1 1 0
2 1 1 1 3
...
etc
I wrote a function which I then tried to vectorize by using apply in order to run calculations row-by-row basis, but it doesn't work due to the booleans:
day_in_season<-function(tarr){
#first month in season
if((tarr$month==12) || (tarr$month==3) ||(tarr$month==6) || (tarr$month==9)){
d=tarr$day
#second month in season
}else if ((tarr$month==1) || (tarr$month==4)){
d=31+tarr$day
}else if((tarr$month==7) || (tarr$month==10)){
d=30+tarr$day
#third month in season
}else if((tarr$month==2)){
d=62+tarr$day
}else{
d=61+tarr$day
}
h=tarr$hour/24
d=d+h
return(d)
}
I tried
apply(tdjf,1,day_in_season)
but it raised this exception:
Error in tarr$month : $ operator is invalid for atomic vectors
(I already knew about this potential pitfall, but that's why I wanted to use apply in the first place!)
The only way I can currently get it to work is if I do this:
days<-c()
for (x in 1:nrow(tdjf)){
d<-day_in_season(tdjf[x,])
days=append(days,d)
}
If there were only a few values, I'd throw up my hands and just use the for loop, efficiency be damned, but I have over 15,000 rows and that's just one dataset. I know that there has to be a way to make it work.
To vectorize your code, use ifelse() and| instead of ||:
ifelse(
(tarr$month==12) | (tarr$month==3) |(tarr$month==6) | (tarr$month==9),
tarr$day,
ifelse((tarr$month==1) | (tarr$month==4),
31+tarr$day,
ifelse((tarr$month==7) | (tarr$month==10),
30+tarr$day,
ifelse(tarr$month==2,
62+tarr$day,
61+tarr$day)
)
)
)+tarr$hour/24
You might be surprised at how quickly a well constructed for loop can run. If designed well, it has about the same efficiency of an apply statement.
The properfor loop in your case is
tdjf$days <- vector ("numeric", nrow (tdjf))
for (x in seq_along (tdjf$days)){
tdjf$days [x] <- day_in_season(tdjf[x,])
}
If you really want to go the apply route, I would recommend rewriting your function to take three arguments -- month, day, and hour -- and pass those three columns into mapply

Resources