Replace logical values conditionally in R

Replace logical values conditionally in R - r

I am sure this question has been asked before and has an easy solution, but I can't seem to find it.
I am trying to conditionally replace the logical value of a variable based on the value of other variables in the data. Specifically, I am trying to determine eligibility based on survey responses.
I have created my eligibility variable in dataframe screen:
screen$eligible <- ifelse (
(screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& (screen$county_1 == 17 | screen$county_1 == 27 | screen$county_1 == 31)
& (screen$residence_1 == 47),
TRUE,
FALSE)
And now, based on study changes, I would like to further limit eligibility. I tried the code below, and it works in part, but it appears that I am introducing NAs to my eligibility variable and missing out on folks who should be eligible.
screen$eligible <- ifelse( screen$eligible ==TRUE, ifelse(
(screen$gender_1 == 1 & screen$age > 18)
|(screen$gender_8 == 1 & screen$age > 20),
FALSE, TRUE), FALSE)
I ultimately want TRUE or FALSE values.
Two questions
Is there a clearer or more concise way to update the code to update my eligibility requirements?
Any ideas as to why I might be introducing NAs?

continuing from what #zephryl wrote, an even more readable code is:
screen$eligible <- with(screen,
(age > 17 & age < 23)
& (alcohol > 3 | marijuana > 3)
& (country == 0 | ageus < 12)
& county_1 %in% c(17, 27, 31)
& (residence_1 == 47))
to detect where are the NAs:
sapply(screen, anyNA)

1. Is there a clearer or more concise way to update the code to update my eligibility requirements?
If you ever find yourself writing x = ifelse(condition, TRUE, FALSE), as you are here -- that's equivalent to just writing x = condition. Also, your three county_1 == x statements can be replaced with one county_1 %in% c(x, y, z). So your first code block could be written as,
screen$eligible <- (screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& screen$county_1 %in% c(17, 27, 31)
& (screen$residence_1 == 47)
Likewise, your second codeblock could be simplified as:
screen$eligible <- screen$eligible
& ((screen$gender_1 == 1 & screen$age > 18)
| (screen$gender_8 == 1 & screen$age > 20))
2. Any ideas as to why I might be introducing NAs?
It's hard to say without seeing your data, but the NAs probably indicate that one or more of your constituent variables (gender_1, gender_8, age) is NA for some cases.

Related

I want to write to two different columns off of one line in an if else loop

I am trying to use a for if else loop to iterate through my data. For one of the loops I want to change two different columns, one with a formula and one with a written explanation of why the number is what it is. A snippet of my code follows.
library(MASS)
library(plyr)
library(dplyr)
library(tidyverse)
if(((SGR >= 5 ) & (SGR30 <= 0 | is.na(SGR30)) & (SGR20 <= 0 | is.na(SGR20)) & (SGR10 <= 0 | is.na(SGR10))))
{
(DataWSGR[k,24] <- ((2/10*FactoredAADT*1) + FactoredAADT)) & (DataWSGR[k,25] <- "1%")
}
When I run my code I get an error message that says
Error in (DataWSGR[k, 24] <- ((2/10 * FactoredAADT * 1) + FactoredAADT)) & :
operations are possible only for numeric, logical or complex types.
What am I doing wrong to get both columns to change?

Welcome to stackoverflow!!! Next time please provide a reproducible example. The reason why you get that error is because the & operator is a logical operator only used in logical tests and comparisons not in assignments.
For what you want, you have to put the assignments in two separate lines like so:
library(MASS)
library(plyr)
library(dplyr)
library(tidyverse)
if(((SGR >= 5 ) & (SGR30 <= 0 | is.na(SGR30)) & (SGR20 <= 0 | is.na(SGR20)) & (SGR10 <= 0 | is.na(SGR10))))
{
(DataWSGR[k,24] <- (2/10*FactoredAADT*1) + FactoredAADT)
DataWSGR[k,25] <- "1%"
}

Count number of rows meeting multiple conditions in dataframe

I have a question. I'm working on a database with patients and multiple conditions I scored as yes/no or numbers. I first counted the number of patients (rows) in which patients meet at least one criteria of 5, see this code (working):
nrow( df_1[df_1$tenderness_CS != 'no' | df_1$intoxication != 'no' |
df_1$focal_neuro_deficits != 'no' | df_1$EMV <= 13 | df_1$distr_injury != 'no',] )
But now I want to count how many patients meet 2, 3 and 4 criteria of the above standing. Doesn't matter which of the 5 criteria are met, just if 2 or 3 are met. I really don't know how to do that.
Any help? Thanks!

You can do
n_conditions <- (df_1$tenderness_CS != 'no') +
(df_1$intoxication != 'no') +
(df_1$focal_neuro_deficits != 'no') +
(df_1$EMV <= 13) +
(df_1$distr_injury != 'no')
which will give you a vector of the number of conditions each patient met.
You can then do
table(n_conditions)
to show the times each number of conditions was met, and
df_1[n_conditions == 3,]
To subset the dara frame to get only those patients who met 3 conditions etc.

Instead of doing +, we can make use of rowSums. The advantage is that it would also take of NA elements with na.rm argument i.e. if a particular column have NA in a row, it would result in NA if we do +
nm1 <- c("tenderness_CS", "intoxication",
"focal_neuro_deficits", "distr_injury")
n_conditions <- rowSums(cbind(df_1[nm1] != "no", df_1$EMV <= 13), na.rm = TRUE)
Now, we get the frequency of counts with table
table(n_conditions)

The logicals TRUE and FALSE can be treated like numerics 1 and 0.
So for example TRUE+TRUE is equal to 2.
So you can write:
nrow( df_1[df_1$tenderness_CS != 'no' + df_1$intoxication != 'no' +
df_1$focal_neuro_deficits != 'no' + (df_1$EMV <= 13) + df_1$distr_injury != 'no' %in% c(2,3,4),])
because this will first sum the results of each condition (1 when the condition is TRUE and 0 when it is FALSE) and then test whether the sum is in the vector c(2,3,4).

Logical Operators not subsetting as expected

I am trying to create a subset of the rows that have a value of 1 for variable A, and a value of 1 for at least one of the following variables: B, C, or D.
Subset1 <- subset(Data,
Data$A==1 &
Data$B ==1 ||
Data$C ==1 |
Data$D == 1,
select= A)
Subset1
The problem is that the code above returns some rows that have A=0 and I am not sure why.
To troublehsoot:
I know that && and || are the long forms or and and or which vectorizes it.
I have run this code several times using &&, ||,& and | in different places. Nothing returns what I am looking for exactly.
When I shorten the code, it works fine and I subset only the rows that I would expect:
Subset1 <- subset(Data,
Data$A==1 &
Data$B==0,
select= A)
Subset1
Unfortunately, this doesn't suffice since I also need to capture rows whose C or D value = 1.
Can anyone explain why my first code block is not subsetting what I am expecting it to?

You can use parens to be more specific about what your & is referring to. Otherwise (as #Patrick Trentin clarified) your logical operators are combined according to operator precedence (within the same level of precedence they are evaluated from left to right).
Example:
> FALSE & TRUE | TRUE #equivalent to (FALSE & TRUE) | TRUE
[1] TRUE
> FALSE & (TRUE | TRUE)
[1] FALSE
So in your case you can try something like below (assuming you want items that A == 1 & that meet one of the other conditions):
Data$A==1 & (Data$B==1 | Data$C==1 | Data$D==1)

Since you didn't provide the data you're working with, I've replicated some here.
set.seed(20)
Data = data.frame(A = sample(0:1, 10, replace=TRUE),
B = sample(0:1, 10, replace=TRUE),
C = sample(0:1, 10, replace=TRUE),
D = sample(0:1, 10, replace=TRUE))
If you use parenthesis, which can evaluate to a logical function, you can achieve what you're looking for.
Subset1 <- subset(Data,
Data$A==1 &
(Data$B == 1 |
Data$C == 1 |
Data$D ==1),
select=A)
Subset1
A
1 1
2 1
4 1
5 1

Ignore NA in Ifelse statement- R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have the following ifelse statement.
ww.LIG = ifelse( (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 )
| ( Accel2$wk.VWD >=2 & Accel2$we.VWD >=1 )
| ( Accel2$wk.VWD >=1 & Accel2$we.VWD >=2) ,
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA)
The final line takes the average of two variables if the above conditions are met. For data that meets the first criteria in the first line (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 ) there is an NA for the variable named Accel2$we.VWD, which obviously returns a NAN when trying to do the calculation.
What is a simple way to remove NAs form this argument?
Many thanks.

You could solve this in two ways I think:
1) Another ifelse before this to check for NAs - something like:
ww.LIG = ifelse( is.na(Accel2$wk.VWD) | is.na(Accel2$we.VWD), NA,
ifelse( (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 )
| ( Accel2$wk.VWD >=2 & Accel2$we.VWD >=1 )
| ( Accel2$wk.VWD >=1 & Accel2$we.VWD >=2) ,
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA))
2) Remove the NA rows to start with - something like:
df = complete.cases(data.frame(wkVWD = Accel2$wk.VWD, weVWD = Accel2$we.VWD, Accel2$wk.LIG, weLIG = Accel2$we.LIG))
df$wwLIG = ifelse( (df$wkVWD >= 3 & df$weVWD >= 0 )
| ( df$wkVWD >=2 & df$weVWD >=1 )
| ( df$wkVWD >=1 & df$weVWD >=2) ,
(df$wkLIG + df$weLIG)/2, NA)
Does that work for you?

Your problem is ill-defined: what should be the result of the following comparison NA >= value, true or false? Define this first.
I will consider an NA in the conditions means the condition is not satisfied (the sum of a + b is just an optimisation, it can be two separate conditions as well:
a = Accel2$wk.VWD
b = Accel2$we.VWD
ww.LIG = ifelse(!is.na(a + b) &
((a >= 3 & b >= 0) | (a >= 2 & b >= 1) | (a >= 1 & b >= 2)),
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA)

You may be going about it the hard way. I can't tell for sure without a sample of your data, but I think you can just replace your "averaging" line with
mean(c(Accel2$wk.LIG , Accel2$we.LIG), na.rm=TRUE)
It's not clear whether you wanted to keep inputs containing NA values or not.

Changing data based on conditions in R

I have posted a similar question to this before and got a quick answer, but am just an R beginner and haven't been able to adapt it to what I need.
Basically I want to take the below code (says if Date_Index is between two numbers and df is < X, then turn df to Y) and make it so it only applies to entries that meet a certain criteria, i.e:
HAVE: df[df$Date_Index >= 50 & df$Date_Index <= 52 & df < .0000001]=1
ADD: if df$Date_Index <= 49 AND df = 0.00 ignore the above statement, else execute:
In other words I need the equivalent to an if, then, else clause. If Date_Index <= 49 and df = 0, leave alone, else if Date_Index >=50 and Date Index <= 52 and df < .001 then replace data (in Date Index rows 50-52) with 1.
This (simple) data set should illustrate it enough:
xx <- matrix(0,52,5)
xx[,1]=1
xx[,3]=1
xx[,5]=1
xx[50:52,]=0
xx[,1]=1:52
xx[50,3]=1
So what I'd like is column 2 and column 4 to stay all 0's but for the bottom of column 3 and 5 to continue to be all 1's.

I suppose you're looking for this:
xx[xx[,1] >= 50 & xx[,1] <= 52, c(FALSE, !colSums(!xx[xx[,1] <= 49, -1]))] <- 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Replace logical values conditionally in R - r

continuing from what #zephryl wrote, an even more readable code is: screen$eligible <- with(screen, (age > 17 & age < 23) & (alcohol > 3 | marijuana > 3) & (country == 0 | ageus < 12) & county_1 %in% c(17, 27, 31) & (residence_1 == 47)) to detect where are the NAs: sapply(screen, anyNA)

Related

I want to write to two different columns off of one line in an if else loop

Count number of rows meeting multiple conditions in dataframe

Logical Operators not subsetting as expected

Ignore NA in Ifelse statement- R [closed]

Changing data based on conditions in R

Categories

Resources