I am sure this question has been asked before and has an easy solution, but I can't seem to find it.
I am trying to conditionally replace the logical value of a variable based on the value of other variables in the data. Specifically, I am trying to determine eligibility based on survey responses.
I have created my eligibility variable in dataframe screen:
screen$eligible <- ifelse (
(screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& (screen$county_1 == 17 | screen$county_1 == 27 | screen$county_1 == 31)
& (screen$residence_1 == 47),
TRUE,
FALSE)
And now, based on study changes, I would like to further limit eligibility. I tried the code below, and it works in part, but it appears that I am introducing NAs to my eligibility variable and missing out on folks who should be eligible.
screen$eligible <- ifelse( screen$eligible ==TRUE, ifelse(
(screen$gender_1 == 1 & screen$age > 18)
|(screen$gender_8 == 1 & screen$age > 20),
FALSE, TRUE), FALSE)
I ultimately want TRUE or FALSE values.
Two questions
Is there a clearer or more concise way to update the code to update my eligibility requirements?
Any ideas as to why I might be introducing NAs?
continuing from what #zephryl wrote, an even more readable code is:
screen$eligible <- with(screen,
(age > 17 & age < 23)
& (alcohol > 3 | marijuana > 3)
& (country == 0 | ageus < 12)
& county_1 %in% c(17, 27, 31)
& (residence_1 == 47))
to detect where are the NAs:
sapply(screen, anyNA)
1. Is there a clearer or more concise way to update the code to update my eligibility requirements?
If you ever find yourself writing x = ifelse(condition, TRUE, FALSE), as you are here -- that's equivalent to just writing x = condition. Also, your three county_1 == x statements can be replaced with one county_1 %in% c(x, y, z). So your first code block could be written as,
screen$eligible <- (screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& screen$county_1 %in% c(17, 27, 31)
& (screen$residence_1 == 47)
Likewise, your second codeblock could be simplified as:
screen$eligible <- screen$eligible
& ((screen$gender_1 == 1 & screen$age > 18)
| (screen$gender_8 == 1 & screen$age > 20))
2. Any ideas as to why I might be introducing NAs?
It's hard to say without seeing your data, but the NAs probably indicate that one or more of your constituent variables (gender_1, gender_8, age) is NA for some cases.
Related
I am trying to use a for if else loop to iterate through my data. For one of the loops I want to change two different columns, one with a formula and one with a written explanation of why the number is what it is. A snippet of my code follows.
library(MASS)
library(plyr)
library(dplyr)
library(tidyverse)
if(((SGR >= 5 ) & (SGR30 <= 0 | is.na(SGR30)) & (SGR20 <= 0 | is.na(SGR20)) & (SGR10 <= 0 | is.na(SGR10))))
{
(DataWSGR[k,24] <- ((2/10*FactoredAADT*1) + FactoredAADT)) & (DataWSGR[k,25] <- "1%")
}
When I run my code I get an error message that says
Error in (DataWSGR[k, 24] <- ((2/10 * FactoredAADT * 1) + FactoredAADT)) & :
operations are possible only for numeric, logical or complex types.
What am I doing wrong to get both columns to change?
Welcome to stackoverflow!!! Next time please provide a reproducible example. The reason why you get that error is because the & operator is a logical operator only used in logical tests and comparisons not in assignments.
For what you want, you have to put the assignments in two separate lines like so:
library(MASS)
library(plyr)
library(dplyr)
library(tidyverse)
if(((SGR >= 5 ) & (SGR30 <= 0 | is.na(SGR30)) & (SGR20 <= 0 | is.na(SGR20)) & (SGR10 <= 0 | is.na(SGR10))))
{
(DataWSGR[k,24] <- (2/10*FactoredAADT*1) + FactoredAADT)
DataWSGR[k,25] <- "1%"
}
I have a question. I'm working on a database with patients and multiple conditions I scored as yes/no or numbers. I first counted the number of patients (rows) in which patients meet at least one criteria of 5, see this code (working):
nrow( df_1[df_1$tenderness_CS != 'no' | df_1$intoxication != 'no' |
df_1$focal_neuro_deficits != 'no' | df_1$EMV <= 13 | df_1$distr_injury != 'no',] )
But now I want to count how many patients meet 2, 3 and 4 criteria of the above standing. Doesn't matter which of the 5 criteria are met, just if 2 or 3 are met. I really don't know how to do that.
Any help? Thanks!
You can do
n_conditions <- (df_1$tenderness_CS != 'no') +
(df_1$intoxication != 'no') +
(df_1$focal_neuro_deficits != 'no') +
(df_1$EMV <= 13) +
(df_1$distr_injury != 'no')
which will give you a vector of the number of conditions each patient met.
You can then do
table(n_conditions)
to show the times each number of conditions was met, and
df_1[n_conditions == 3,]
To subset the dara frame to get only those patients who met 3 conditions etc.
Instead of doing +, we can make use of rowSums. The advantage is that it would also take of NA elements with na.rm argument i.e. if a particular column have NA in a row, it would result in NA if we do +
nm1 <- c("tenderness_CS", "intoxication",
"focal_neuro_deficits", "distr_injury")
n_conditions <- rowSums(cbind(df_1[nm1] != "no", df_1$EMV <= 13), na.rm = TRUE)
Now, we get the frequency of counts with table
table(n_conditions)
The logicals TRUE and FALSE can be treated like numerics 1 and 0.
So for example TRUE+TRUE is equal to 2.
So you can write:
nrow( df_1[df_1$tenderness_CS != 'no' + df_1$intoxication != 'no' +
df_1$focal_neuro_deficits != 'no' + (df_1$EMV <= 13) + df_1$distr_injury != 'no' %in% c(2,3,4),])
because this will first sum the results of each condition (1 when the condition is TRUE and 0 when it is FALSE) and then test whether the sum is in the vector c(2,3,4).
I am trying to create a subset of the rows that have a value of 1 for variable A, and a value of 1 for at least one of the following variables: B, C, or D.
Subset1 <- subset(Data,
Data$A==1 &
Data$B ==1 ||
Data$C ==1 |
Data$D == 1,
select= A)
Subset1
The problem is that the code above returns some rows that have A=0 and I am not sure why.
To troublehsoot:
I know that && and || are the long forms or and and or which vectorizes it.
I have run this code several times using &&, ||,& and | in different places. Nothing returns what I am looking for exactly.
When I shorten the code, it works fine and I subset only the rows that I would expect:
Subset1 <- subset(Data,
Data$A==1 &
Data$B==0,
select= A)
Subset1
Unfortunately, this doesn't suffice since I also need to capture rows whose C or D value = 1.
Can anyone explain why my first code block is not subsetting what I am expecting it to?
You can use parens to be more specific about what your & is referring to. Otherwise (as #Patrick Trentin clarified) your logical operators are combined according to operator precedence (within the same level of precedence they are evaluated from left to right).
Example:
> FALSE & TRUE | TRUE #equivalent to (FALSE & TRUE) | TRUE
[1] TRUE
> FALSE & (TRUE | TRUE)
[1] FALSE
So in your case you can try something like below (assuming you want items that A == 1 & that meet one of the other conditions):
Data$A==1 & (Data$B==1 | Data$C==1 | Data$D==1)
Since you didn't provide the data you're working with, I've replicated some here.
set.seed(20)
Data = data.frame(A = sample(0:1, 10, replace=TRUE),
B = sample(0:1, 10, replace=TRUE),
C = sample(0:1, 10, replace=TRUE),
D = sample(0:1, 10, replace=TRUE))
If you use parenthesis, which can evaluate to a logical function, you can achieve what you're looking for.
Subset1 <- subset(Data,
Data$A==1 &
(Data$B == 1 |
Data$C == 1 |
Data$D ==1),
select=A)
Subset1
A
1 1
2 1
4 1
5 1
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have the following ifelse statement.
ww.LIG = ifelse( (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 )
| ( Accel2$wk.VWD >=2 & Accel2$we.VWD >=1 )
| ( Accel2$wk.VWD >=1 & Accel2$we.VWD >=2) ,
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA)
The final line takes the average of two variables if the above conditions are met. For data that meets the first criteria in the first line (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 ) there is an NA for the variable named Accel2$we.VWD, which obviously returns a NAN when trying to do the calculation.
What is a simple way to remove NAs form this argument?
Many thanks.
You could solve this in two ways I think:
1) Another ifelse before this to check for NAs - something like:
ww.LIG = ifelse( is.na(Accel2$wk.VWD) | is.na(Accel2$we.VWD), NA,
ifelse( (Accel2$wk.VWD >= 3 & Accel2$we.VWD >= 0 )
| ( Accel2$wk.VWD >=2 & Accel2$we.VWD >=1 )
| ( Accel2$wk.VWD >=1 & Accel2$we.VWD >=2) ,
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA))
2) Remove the NA rows to start with - something like:
df = complete.cases(data.frame(wkVWD = Accel2$wk.VWD, weVWD = Accel2$we.VWD, Accel2$wk.LIG, weLIG = Accel2$we.LIG))
df$wwLIG = ifelse( (df$wkVWD >= 3 & df$weVWD >= 0 )
| ( df$wkVWD >=2 & df$weVWD >=1 )
| ( df$wkVWD >=1 & df$weVWD >=2) ,
(df$wkLIG + df$weLIG)/2, NA)
Does that work for you?
Your problem is ill-defined: what should be the result of the following comparison NA >= value, true or false? Define this first.
I will consider an NA in the conditions means the condition is not satisfied (the sum of a + b is just an optimisation, it can be two separate conditions as well:
a = Accel2$wk.VWD
b = Accel2$we.VWD
ww.LIG = ifelse(!is.na(a + b) &
((a >= 3 & b >= 0) | (a >= 2 & b >= 1) | (a >= 1 & b >= 2)),
(Accel2$wk.LIG + Accel2$we.LIG)/2, NA)
You may be going about it the hard way. I can't tell for sure without a sample of your data, but I think you can just replace your "averaging" line with
mean(c(Accel2$wk.LIG , Accel2$we.LIG), na.rm=TRUE)
It's not clear whether you wanted to keep inputs containing NA values or not.
I have posted a similar question to this before and got a quick answer, but am just an R beginner and haven't been able to adapt it to what I need.
Basically I want to take the below code (says if Date_Index is between two numbers and df is < X, then turn df to Y) and make it so it only applies to entries that meet a certain criteria, i.e:
HAVE: df[df$Date_Index >= 50 & df$Date_Index <= 52 & df < .0000001]=1
ADD: if df$Date_Index <= 49 AND df = 0.00 ignore the above statement, else execute:
In other words I need the equivalent to an if, then, else clause. If Date_Index <= 49 and df = 0, leave alone, else if Date_Index >=50 and Date Index <= 52 and df < .001 then replace data (in Date Index rows 50-52) with 1.
This (simple) data set should illustrate it enough:
xx <- matrix(0,52,5)
xx[,1]=1
xx[,3]=1
xx[,5]=1
xx[50:52,]=0
xx[,1]=1:52
xx[50,3]=1
So what I'd like is column 2 and column 4 to stay all 0's but for the bottom of column 3 and 5 to continue to be all 1's.
I suppose you're looking for this:
xx[xx[,1] >= 50 & xx[,1] <= 52, c(FALSE, !colSums(!xx[xx[,1] <= 49, -1]))] <- 1