Ignore NA in Ifelse statement- R [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have the following ifelse statement.
ww.LIG = ifelse( (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 )
| ( Accel2$wk.VWD >=2 & Accel2$we.VWD >=1 )
| ( Accel2$wk.VWD >=1 & Accel2$we.VWD >=2) ,
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA)
The final line takes the average of two variables if the above conditions are met. For data that meets the first criteria in the first line (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 ) there is an NA for the variable named Accel2$we.VWD, which obviously returns a NAN when trying to do the calculation.
What is a simple way to remove NAs form this argument?
Many thanks.

You could solve this in two ways I think:
1) Another ifelse before this to check for NAs - something like:
ww.LIG = ifelse( is.na(Accel2$wk.VWD) | is.na(Accel2$we.VWD), NA,
ifelse( (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 )
| ( Accel2$wk.VWD >=2 & Accel2$we.VWD >=1 )
| ( Accel2$wk.VWD >=1 & Accel2$we.VWD >=2) ,
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA))
2) Remove the NA rows to start with - something like:
df = complete.cases(data.frame(wkVWD = Accel2$wk.VWD, weVWD = Accel2$we.VWD, Accel2$wk.LIG, weLIG = Accel2$we.LIG))
df$wwLIG = ifelse( (df$wkVWD >= 3 & df$weVWD >= 0 )
| ( df$wkVWD >=2 & df$weVWD >=1 )
| ( df$wkVWD >=1 & df$weVWD >=2) ,
(df$wkLIG + df$weLIG)/2, NA)
Does that work for you?

Your problem is ill-defined: what should be the result of the following comparison NA >= value, true or false? Define this first.
I will consider an NA in the conditions means the condition is not satisfied (the sum of a + b is just an optimisation, it can be two separate conditions as well:
a = Accel2$wk.VWD
b = Accel2$we.VWD
ww.LIG = ifelse(!is.na(a + b) &
((a >= 3 & b >= 0) | (a >= 2 & b >= 1) | (a >= 1 & b >= 2)),
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA)

You may be going about it the hard way. I can't tell for sure without a sample of your data, but I think you can just replace your "averaging" line with
mean(c(Accel2$wk.LIG , Accel2$we.LIG), na.rm=TRUE)
It's not clear whether you wanted to keep inputs containing NA values or not.

Related

Replace logical values conditionally in R

I am sure this question has been asked before and has an easy solution, but I can't seem to find it.
I am trying to conditionally replace the logical value of a variable based on the value of other variables in the data. Specifically, I am trying to determine eligibility based on survey responses.
I have created my eligibility variable in dataframe screen:
screen$eligible <- ifelse (
(screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& (screen$county_1 == 17 | screen$county_1 == 27 | screen$county_1 == 31)
& (screen$residence_1 == 47),
TRUE,
FALSE)
And now, based on study changes, I would like to further limit eligibility. I tried the code below, and it works in part, but it appears that I am introducing NAs to my eligibility variable and missing out on folks who should be eligible.
screen$eligible <- ifelse( screen$eligible ==TRUE, ifelse(
(screen$gender_1 == 1 & screen$age > 18)
|(screen$gender_8 == 1 & screen$age > 20),
FALSE, TRUE), FALSE)
I ultimately want TRUE or FALSE values.
Two questions
Is there a clearer or more concise way to update the code to update my eligibility requirements?
Any ideas as to why I might be introducing NAs?
continuing from what #zephryl wrote, an even more readable code is:
screen$eligible <- with(screen,
(age > 17 & age < 23)
& (alcohol > 3 | marijuana > 3)
& (country == 0 | ageus < 12)
& county_1 %in% c(17, 27, 31)
& (residence_1 == 47))
to detect where are the NAs:
sapply(screen, anyNA)
1. Is there a clearer or more concise way to update the code to update my eligibility requirements?
If you ever find yourself writing x = ifelse(condition, TRUE, FALSE), as you are here -- that's equivalent to just writing x = condition. Also, your three county_1 == x statements can be replaced with one county_1 %in% c(x, y, z). So your first code block could be written as,
screen$eligible <- (screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& screen$county_1 %in% c(17, 27, 31)
& (screen$residence_1 == 47)
Likewise, your second codeblock could be simplified as:
screen$eligible <- screen$eligible
& ((screen$gender_1 == 1 & screen$age > 18)
| (screen$gender_8 == 1 & screen$age > 20))
2. Any ideas as to why I might be introducing NAs?
It's hard to say without seeing your data, but the NAs probably indicate that one or more of your constituent variables (gender_1, gender_8, age) is NA for some cases.

Using ifelse function in R to sort results that fulfil a "less than and greater than"

I am trying to get this function to select outputs that fall in between two values.
Data <- Data %>%
mutate(D= ifelse(A >= "80" & B == "InPlay" & (C <= "20")&(C >= "6"), "YES", paste(D)))
So I would like column D to read "YES" when column A is greater than 20, column B reads "InPlay", and column C falls between 6 and 20.
Are you trying to compare the numbers as strings or as numbers?
If they are strings currently, Can you convert them to numbers using a = as.numeric(a)?
That'll allow you to use the typical operator functions.
The example below worked for me in a fresh script, I'm assuming you'd want this running inside a loop.
A = 90
B = "InPlay"
C = 15
D = "No"
if( A >= 80 & B == "InPlay" & C <= 20 & C >= 6) {
D = "YES"
}

Count number of rows meeting multiple conditions in dataframe

I have a question. I'm working on a database with patients and multiple conditions I scored as yes/no or numbers. I first counted the number of patients (rows) in which patients meet at least one criteria of 5, see this code (working):
nrow( df_1[df_1$tenderness_CS != 'no' | df_1$intoxication != 'no' |
df_1$focal_neuro_deficits != 'no' | df_1$EMV <= 13 | df_1$distr_injury != 'no',] )
But now I want to count how many patients meet 2, 3 and 4 criteria of the above standing. Doesn't matter which of the 5 criteria are met, just if 2 or 3 are met. I really don't know how to do that.
Any help? Thanks!
You can do
n_conditions <- (df_1$tenderness_CS != 'no') +
(df_1$intoxication != 'no') +
(df_1$focal_neuro_deficits != 'no') +
(df_1$EMV <= 13) +
(df_1$distr_injury != 'no')
which will give you a vector of the number of conditions each patient met.
You can then do
table(n_conditions)
to show the times each number of conditions was met, and
df_1[n_conditions == 3,]
To subset the dara frame to get only those patients who met 3 conditions etc.
Instead of doing +, we can make use of rowSums. The advantage is that it would also take of NA elements with na.rm argument i.e. if a particular column have NA in a row, it would result in NA if we do +
nm1 <- c("tenderness_CS", "intoxication",
"focal_neuro_deficits", "distr_injury")
n_conditions <- rowSums(cbind(df_1[nm1] != "no", df_1$EMV <= 13), na.rm = TRUE)
Now, we get the frequency of counts with table
table(n_conditions)
The logicals TRUE and FALSE can be treated like numerics 1 and 0.
So for example TRUE+TRUE is equal to 2.
So you can write:
nrow( df_1[df_1$tenderness_CS != 'no' + df_1$intoxication != 'no' +
df_1$focal_neuro_deficits != 'no' + (df_1$EMV <= 13) + df_1$distr_injury != 'no' %in% c(2,3,4),])
because this will first sum the results of each condition (1 when the condition is TRUE and 0 when it is FALSE) and then test whether the sum is in the vector c(2,3,4).

Logical Operators not subsetting as expected

I am trying to create a subset of the rows that have a value of 1 for variable A, and a value of 1 for at least one of the following variables: B, C, or D.
Subset1 <- subset(Data,
Data$A==1 &
Data$B ==1 ||
Data$C ==1 |
Data$D == 1,
select= A)
Subset1
The problem is that the code above returns some rows that have A=0 and I am not sure why.
To troublehsoot:
I know that && and || are the long forms or and and or which vectorizes it.
I have run this code several times using &&, ||,& and | in different places. Nothing returns what I am looking for exactly.
When I shorten the code, it works fine and I subset only the rows that I would expect:
Subset1 <- subset(Data,
Data$A==1 &
Data$B==0,
select= A)
Subset1
Unfortunately, this doesn't suffice since I also need to capture rows whose C or D value = 1.
Can anyone explain why my first code block is not subsetting what I am expecting it to?
You can use parens to be more specific about what your & is referring to. Otherwise (as #Patrick Trentin clarified) your logical operators are combined according to operator precedence (within the same level of precedence they are evaluated from left to right).
Example:
> FALSE & TRUE | TRUE #equivalent to (FALSE & TRUE) | TRUE
[1] TRUE
> FALSE & (TRUE | TRUE)
[1] FALSE
So in your case you can try something like below (assuming you want items that A == 1 & that meet one of the other conditions):
Data$A==1 & (Data$B==1 | Data$C==1 | Data$D==1)
Since you didn't provide the data you're working with, I've replicated some here.
set.seed(20)
Data = data.frame(A = sample(0:1, 10, replace=TRUE),
B = sample(0:1, 10, replace=TRUE),
C = sample(0:1, 10, replace=TRUE),
D = sample(0:1, 10, replace=TRUE))
If you use parenthesis, which can evaluate to a logical function, you can achieve what you're looking for.
Subset1 <- subset(Data,
Data$A==1 &
(Data$B == 1 |
Data$C == 1 |
Data$D ==1),
select=A)
Subset1
A
1 1
2 1
4 1
5 1

Scoring a column with conditions in R using sd

I'm an R/coding newbie. I want to assign a score to a column based on some conditions. I have some random data below, that helps explain my own data.
name average score
a -3.56714858 0
a -0.41934072 0
a -1.02200958 0
b 0.67713883 0
b 0.29228235 0
b 0.11338159 0
c -1.48595572 0
c -0.35328884 0
c -1.26491347 0
d -0.27093065 0
d -0.14913264 0
What I want to do;
If (name=a & average > 2sd of benchmark) then assign score= 2
if (name=a & average < 2sd of benchmark) then assign score=0.5
etc.
Edit: benchmark = average(of top 3 "a"), so I'm scoring the rest of the "a" based on how they compare to the top three, so how many standard deviations they lie from the top 3.
Each letter has its own benchmark or number that I am comparing it to. So I was manually going through, letter by letter, like:
df$score[df$name == "a"
& df$average >= benchmark
& df$average < (benchmark + sd(benchmark)]<- 1
df$score[df$name == "a"
& df$average >= (benchmark + sd(benchmark)
& df$average < (benchmark+ 2sd(benchmark))]<- 2.0
df$score[df$name == "a"
& df$average > (benchmark+ 2sd(benchmark))]<- 2.5
df$score[df$name == "a"
& df$average < benchmark
& df$average >= (benchmark - sd(benchmark)]<- 1
df$score[df$name == "a"
& df$average < (benchmark - sd(benchmark)
& df$average >= (benchmark - 2sd(benchmark))]<- 0.5
df$score[df$name == "a"
& df$average < (benchmark - 2sd(benchmark))]<- 0
I have thousands of rows and more groups than the letters a-d. I'm hoping I can find a faster way to do this. My long method is also creating errors. Please help
I have the same scoring principle for each group, but the benchmark is different for each group.
I would structure your problem this way: First you have your data frame, then you should also have another data frame, which we will call bench, with variables as "name", "benchmark", and "sd.benchmark". Something like that. Then I would
You can use dplyr package:
require(dplyr)
df.new <- left_join(df, bench, by = "name") %>%
mutate(score = ifelse(average >= (benchmark - sd.benchmark) & average < (benchmark + sd.benchmark), 1,
ifelse(average >= (benchmark + sd.benchmark) & average < (benchmark + 2*sd.benchmark), 2,
ifelse(average >= 2*sd.benchmark, 2.5,
ifelse(average < (benchmark - sd.benchmark) & average >= (benchmark - 2* sd.benchmark), .5, 0)))))
The reason I have less conditions is because in your example benchmark - sd(benchmark) and benchmark + sd(benchmark) had the same value of 1 for conjoining ranges.
left_join combines bench to df using all values of df. It is like merge(x,y, all.x = T). From the join, the steps are now passing data to the mutate. mutate creates a new variable based on the ifelse statements.

Resources