For/While Loop on variables that satisfy a certain condition [R] - r

urI'm trying to write a if else statement (ultimately) in R, but only for variables that satisfy a certain criteria. I'm sure there is an easy way to do this - but can't seem to find anything specific when searching...
Below is an example of a while loop (not sure whether I can use this for this purpose):
while(gene[c(36)] >=30 & gene[c(37)] >=30 & gene[c(38)] >=30)
{
gene$Category <- ifelse((gene[c(49)] == './.' & gene[c(48)] == './.'), 'N/A', ifelse(((gene[c(50)] == './.') & (gene[c(36)] >=30 & gene[c(37)] >=30)),'denovo deletion',''))
}
I technically want to run the if else statement on a variable(s) only if certain other conditions are met. Am I overly complicating this?

Assuming that your ifelse construct is OK, you can "subset" the frame based on the condition that is now expressed in your while loop:
condition = (gene[36] >=30 & gene[37] >=30 & gene[38] >=30)
gene$Category[condition] <- ifelse((gene[49] == './.' & gene[48] == './.'), 'N/A', ifelse(((gene[50] == './.') & (gene[36] >=30 & gene[37] >=30)),'denovo deletion',''))

Related

How to write multiple if/else if conditions in R on Row record selection

I have a simple question on adding a Flag to indicate if the day is out of the scheduled range. As shown in the following image, each should be occurred within a 6 day range, e.g., for Week 2, the should be 9 <= STDTY <= 21, otherwise it will be flagged as Flag="Y".
if (data$VISIT=="Screening" & data$STDTY>=-1) {
data$Flag="Y"
} else if (data$VISIT=="Day 1" & data$STDTY!=1) {
sv_domain$Flag="Y"
} else if (data$VISIT=="Week 2" & data$STDTY<(2*7+1-6)) {
data$Flag="Y"
} else if (data$VISIT=="Week 2" & data$STDTY>(2*7+1+6)) {
data$Flag="Y"
.......
I know it doesn't work, please help me out, thanks!
if/else is not vectorized. We may use ifelse or more easily with case_when
library(dplyr)
case_when(data$VISIT=="Screening" & data$STDTY>=-1|
data$VISIT=="Week 2" & data$STDTY<(2*7+1-6)|
data$VISIT=="Week 2" & data$STDTY>(2*7+1+6) ~ "Y")
Or with ifelse
ifelse(data$VISIT=="Screening" & data$STDTY>=-1|
data$VISIT=="Week 2" & data$STDTY<(2*7+1-6)|
data$VISIT=="Week 2" & data$STDTY>(2*7+1+6), "Y", NA)

If condition is not showing the result

I am running below code, its working but not showing me output
for (name in tita$name){
if (tita$sex == 'female' && tita$embarked == 'S' && tita$age > 33.00)
{
print (name)
}
}
It's just showing me ****** in R studio, though when I check dataset, it has data which have female having age greater than 33 and embarked from S, but this statement is not showing me result. But when I change the value from 33 to 28 the same code shows me the result. Why is that.
I am using the following dataset:
https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3.csv
I think you're mixing loops and vectorization where you shouldn't. As I mentioned in the comments your conditions are vectorized, but it looks like you're trying to evaluate each element in a loop.
You should do either:
# loop through elements
for (i in seq_along(tita$name)){
if (tita$sex[i] == 'female' & tita$embarked[i] == 'S' & tita$age[i] > 33.00){
print(tita$name[i])
}
}
OR use vectorization (this will be faster and is recommended):
conditions <- tita$sex == 'female' & tita$embarked == 'S' & tita$age > 33.00
names <- tita$name[conditions]
Here conditions is a TRUE and FALSE logical vector -- TRUE where all the conditions are met. We can use the to subset in R. For more information on what I mean by vectorization please see this link.

Using condition in columns of data frame to generate a vector in R

I have the following array:
Year Month Day Hour
1 1 1 1 0
2 1 1 1 3
...
etc
I wrote a function which I then tried to vectorize by using apply in order to run calculations row-by-row basis, but it doesn't work due to the booleans:
day_in_season<-function(tarr){
#first month in season
if((tarr$month==12) || (tarr$month==3) ||(tarr$month==6) || (tarr$month==9)){
d=tarr$day
#second month in season
}else if ((tarr$month==1) || (tarr$month==4)){
d=31+tarr$day
}else if((tarr$month==7) || (tarr$month==10)){
d=30+tarr$day
#third month in season
}else if((tarr$month==2)){
d=62+tarr$day
}else{
d=61+tarr$day
}
h=tarr$hour/24
d=d+h
return(d)
}
I tried
apply(tdjf,1,day_in_season)
but it raised this exception:
Error in tarr$month : $ operator is invalid for atomic vectors
(I already knew about this potential pitfall, but that's why I wanted to use apply in the first place!)
The only way I can currently get it to work is if I do this:
days<-c()
for (x in 1:nrow(tdjf)){
d<-day_in_season(tdjf[x,])
days=append(days,d)
}
If there were only a few values, I'd throw up my hands and just use the for loop, efficiency be damned, but I have over 15,000 rows and that's just one dataset. I know that there has to be a way to make it work.
To vectorize your code, use ifelse() and| instead of ||:
ifelse(
(tarr$month==12) | (tarr$month==3) |(tarr$month==6) | (tarr$month==9),
tarr$day,
ifelse((tarr$month==1) | (tarr$month==4),
31+tarr$day,
ifelse((tarr$month==7) | (tarr$month==10),
30+tarr$day,
ifelse(tarr$month==2,
62+tarr$day,
61+tarr$day)
)
)
)+tarr$hour/24
You might be surprised at how quickly a well constructed for loop can run. If designed well, it has about the same efficiency of an apply statement.
The properfor loop in your case is
tdjf$days <- vector ("numeric", nrow (tdjf))
for (x in seq_along (tdjf$days)){
tdjf$days [x] <- day_in_season(tdjf[x,])
}
If you really want to go the apply route, I would recommend rewriting your function to take three arguments -- month, day, and hour -- and pass those three columns into mapply

How to write this ifelse statement in R correctly

I want to return a value in a column, or NA, contingent on values in other columns.
I basically want to see if the value in the column meets the first test criteria:
df$v2.1 >= df$varx & df$v3.1 <6
if not does it meet the second:
df$v4.1 >= df$vary & df$v5.1 >5
and then if neither return NA
The code I have tried is below.
df$v1.1 = ifelse(df$v2.1 >= df$varx & df$v3.1 <6 || df$v4.1 >= df$vary & df$v5.1 >5 ,df$v1.1, NA)
Your only mistake is using || rather than |. || is not vectorised, and only considers the first element. All your other operators (and ifelse()) are vectorised, so the following should work as expected:
df$v1.1 = ifelse(df$v2.1 >= df$varx & df$v3.1 <6 | df$v4.1 >= df$vary & df$v5.1 > 5, df$v1.1, NA)
A good way to check when you're doing reasonably complex or multiple logical operations is to run each one of them and see if you're getting the expected output. If you run:
df$v2.1 >= df$varx & df$v3.1 <6
or
df$v4.1 >= df$vary & df$v5.1 > 5
you should get a vector of logical values. If you run:
df$v2.1 >= df$varx & df$v3.1 <6 || df$v4.1 >= df$vary & df$v5.1 > 5
you should get a single logical value. In your case, that will give a single result from the ifelse(), which then gets recycled to fill df$v1.1.
From what I can tell df$v1.1 is already defined, so you only need to modify those rows that fail the test in your ifelse. The following might be easier:
df$v1.1[
which(
!(df$v2.1 >= df$varx & df$v3.1 <6) & !(df$v4.1 >= df$vary & df$v5.1 >5))
] <- NA

Create new data set that meets all of 4 conditions

I would like to create a new dataset where the following four conditions are all met.
rowSums(is.na(UNCA[,11:23]))<12
rowSums(is.na(UNCA[,27:39]))<12
rowSums(is.na(UNCA[,40:52]))<12
rowSums(is.na(UNCA[,53:65]))<12
Thanks!
Then use the & operator:
UNCA.new <- UNCA[rowSums(is.na(UNCA[,11:23])) < 12 &
rowSums(is.na(UNCA[,27:39])) < 12 &
rowSums(is.na(UNCA[,40:52])) < 12 &
rowSums(is.na(UNCA[,53:65])) < 12, ]
A single & is a vectorized function, while a double && is unary (typically used in an if statement, for instance).

Resources