Create new data set that meets all of 4 conditions - r

I would like to create a new dataset where the following four conditions are all met.
rowSums(is.na(UNCA[,11:23]))<12
rowSums(is.na(UNCA[,27:39]))<12
rowSums(is.na(UNCA[,40:52]))<12
rowSums(is.na(UNCA[,53:65]))<12
Thanks!

Then use the & operator:
UNCA.new <- UNCA[rowSums(is.na(UNCA[,11:23])) < 12 &
rowSums(is.na(UNCA[,27:39])) < 12 &
rowSums(is.na(UNCA[,40:52])) < 12 &
rowSums(is.na(UNCA[,53:65])) < 12, ]
A single & is a vectorized function, while a double && is unary (typically used in an if statement, for instance).

Related

If condition is not showing the result

I am running below code, its working but not showing me output
for (name in tita$name){
if (tita$sex == 'female' && tita$embarked == 'S' && tita$age > 33.00)
{
print (name)
}
}
It's just showing me ****** in R studio, though when I check dataset, it has data which have female having age greater than 33 and embarked from S, but this statement is not showing me result. But when I change the value from 33 to 28 the same code shows me the result. Why is that.
I am using the following dataset:
https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3.csv
I think you're mixing loops and vectorization where you shouldn't. As I mentioned in the comments your conditions are vectorized, but it looks like you're trying to evaluate each element in a loop.
You should do either:
# loop through elements
for (i in seq_along(tita$name)){
if (tita$sex[i] == 'female' & tita$embarked[i] == 'S' & tita$age[i] > 33.00){
print(tita$name[i])
}
}
OR use vectorization (this will be faster and is recommended):
conditions <- tita$sex == 'female' & tita$embarked == 'S' & tita$age > 33.00
names <- tita$name[conditions]
Here conditions is a TRUE and FALSE logical vector -- TRUE where all the conditions are met. We can use the to subset in R. For more information on what I mean by vectorization please see this link.

Making new variable through mutate

I want to make a new variable "churned" by taking into account five variables :
Include in churn
A-Churn
B-Churn
C-Churn
D-Churn
My condition is - If variable "Include in churn" has 1 and for all other variables , if any one of the variables has 1 than my new variable "Churned" should have 1 else 0. I am a newbie in using mutate function.
Please help me to create this new variable thru 'mutate' function.
If I understand your formulation logically, you want
mutate(data, Churned = Include.in.Churn == 1 & (A.Churn == 1 | B.Churn == 1 | C.Churn == 1 | D.Churn == 1))
This will make Churned a logical. If you really need an integer, as.integer will produce 1 for TRUE and 0 for FALSE.
If all mentioned Variables are either 1 or 0 you can also use the possibly faster
mutate(data, Churned = Include.in.Churn * (A.Churn + B.Churn + C.Churn + D.Churn) >= 1)

Using condition in columns of data frame to generate a vector in R

I have the following array:
Year Month Day Hour
1 1 1 1 0
2 1 1 1 3
...
etc
I wrote a function which I then tried to vectorize by using apply in order to run calculations row-by-row basis, but it doesn't work due to the booleans:
day_in_season<-function(tarr){
#first month in season
if((tarr$month==12) || (tarr$month==3) ||(tarr$month==6) || (tarr$month==9)){
d=tarr$day
#second month in season
}else if ((tarr$month==1) || (tarr$month==4)){
d=31+tarr$day
}else if((tarr$month==7) || (tarr$month==10)){
d=30+tarr$day
#third month in season
}else if((tarr$month==2)){
d=62+tarr$day
}else{
d=61+tarr$day
}
h=tarr$hour/24
d=d+h
return(d)
}
I tried
apply(tdjf,1,day_in_season)
but it raised this exception:
Error in tarr$month : $ operator is invalid for atomic vectors
(I already knew about this potential pitfall, but that's why I wanted to use apply in the first place!)
The only way I can currently get it to work is if I do this:
days<-c()
for (x in 1:nrow(tdjf)){
d<-day_in_season(tdjf[x,])
days=append(days,d)
}
If there were only a few values, I'd throw up my hands and just use the for loop, efficiency be damned, but I have over 15,000 rows and that's just one dataset. I know that there has to be a way to make it work.
To vectorize your code, use ifelse() and| instead of ||:
ifelse(
(tarr$month==12) | (tarr$month==3) |(tarr$month==6) | (tarr$month==9),
tarr$day,
ifelse((tarr$month==1) | (tarr$month==4),
31+tarr$day,
ifelse((tarr$month==7) | (tarr$month==10),
30+tarr$day,
ifelse(tarr$month==2,
62+tarr$day,
61+tarr$day)
)
)
)+tarr$hour/24
You might be surprised at how quickly a well constructed for loop can run. If designed well, it has about the same efficiency of an apply statement.
The properfor loop in your case is
tdjf$days <- vector ("numeric", nrow (tdjf))
for (x in seq_along (tdjf$days)){
tdjf$days [x] <- day_in_season(tdjf[x,])
}
If you really want to go the apply route, I would recommend rewriting your function to take three arguments -- month, day, and hour -- and pass those three columns into mapply

How to write this ifelse statement in R correctly

I want to return a value in a column, or NA, contingent on values in other columns.
I basically want to see if the value in the column meets the first test criteria:
df$v2.1 >= df$varx & df$v3.1 <6
if not does it meet the second:
df$v4.1 >= df$vary & df$v5.1 >5
and then if neither return NA
The code I have tried is below.
df$v1.1 = ifelse(df$v2.1 >= df$varx & df$v3.1 <6 || df$v4.1 >= df$vary & df$v5.1 >5 ,df$v1.1, NA)
Your only mistake is using || rather than |. || is not vectorised, and only considers the first element. All your other operators (and ifelse()) are vectorised, so the following should work as expected:
df$v1.1 = ifelse(df$v2.1 >= df$varx & df$v3.1 <6 | df$v4.1 >= df$vary & df$v5.1 > 5, df$v1.1, NA)
A good way to check when you're doing reasonably complex or multiple logical operations is to run each one of them and see if you're getting the expected output. If you run:
df$v2.1 >= df$varx & df$v3.1 <6
or
df$v4.1 >= df$vary & df$v5.1 > 5
you should get a vector of logical values. If you run:
df$v2.1 >= df$varx & df$v3.1 <6 || df$v4.1 >= df$vary & df$v5.1 > 5
you should get a single logical value. In your case, that will give a single result from the ifelse(), which then gets recycled to fill df$v1.1.
From what I can tell df$v1.1 is already defined, so you only need to modify those rows that fail the test in your ifelse. The following might be easier:
df$v1.1[
which(
!(df$v2.1 >= df$varx & df$v3.1 <6) & !(df$v4.1 >= df$vary & df$v5.1 >5))
] <- NA

For/While Loop on variables that satisfy a certain condition [R]

urI'm trying to write a if else statement (ultimately) in R, but only for variables that satisfy a certain criteria. I'm sure there is an easy way to do this - but can't seem to find anything specific when searching...
Below is an example of a while loop (not sure whether I can use this for this purpose):
while(gene[c(36)] >=30 & gene[c(37)] >=30 & gene[c(38)] >=30)
{
gene$Category <- ifelse((gene[c(49)] == './.' & gene[c(48)] == './.'), 'N/A', ifelse(((gene[c(50)] == './.') & (gene[c(36)] >=30 & gene[c(37)] >=30)),'denovo deletion',''))
}
I technically want to run the if else statement on a variable(s) only if certain other conditions are met. Am I overly complicating this?
Assuming that your ifelse construct is OK, you can "subset" the frame based on the condition that is now expressed in your while loop:
condition = (gene[36] >=30 & gene[37] >=30 & gene[38] >=30)
gene$Category[condition] <- ifelse((gene[49] == './.' & gene[48] == './.'), 'N/A', ifelse(((gene[50] == './.') & (gene[36] >=30 & gene[37] >=30)),'denovo deletion',''))

Resources