I would appreciate any help with this error I am getting in my code for a research project I am working on in R:
I am trying to create a column (named non_political) in a data frame (named privacy, imported from an sav file) representing survey data where:
1 signifies that the respondent was non-political (answered in a non-political way to some questions) and
0 signifies the opposite.
So far, I have written:
privacy$non_political<-NA
privacy$non_political<-ifelse(((privacy$q19 == 3) | (privacy$q19 == 4) | (privacy$q20 == 2) | (privacy$q21==2)), 1, 0)
but
head(privacy$non_political)
returns NA's along with 1's, which means that the 0 option is never executed in the ifelse command.
What could I be doing wrong here?
Thank you!
Using == will generate NA's as output if you have NA in your data. You can use %in% which will return FALSE when comparing with NA.
privacy$non_political<- with(privacy,
ifelse(q19 %in% c(3, 4) | q20 %in% 2 | q21 %in% 2, 1, 0))
You can do this without ifelse as well.
privacy$non_political<- with(privacy,
as.integer(q19 %in% c(3, 4) | q20 %in% 2 | q21 %in% 2))
Related
I have 6 columns in my data frame the column names are exam 1, exam 2, exam 3, result exam 1, result exam 2, result exam 3 respectively the first three columns have numbers and NAs and the last three columns have Pass and Fail and NAs. I want to replace all the NAs with 0s and I want to replace instead of all the Pass words with 1s and instead of all the Fail words with 0s. So I want to replace the Fail with zeros and the NAs also with zeros.
I have used multiple approaches in R but I can't make it work.
df[df == 'NA'] <- 0 , df[df == NA] <- 0
df[df$"result exam 1" == "Pass",]$"result exam 1" = 1
df[df$"result exam 1" == "Fail",]$"result exam 1" = 0
None of these codes are working.
Would someone be able to please help with this problem?
Thank you
You really need to get a better grasp of basic R syntax:
You are putting the subset operator [ in the wrong place (you need to subset your vector, not the data frame)
You are then using the $ operator on the result of the previous operation, and that throws an error (that you should have posted) because $ cannot be used on vectors.
You are testing a value for missingness: x == NA has no sense: how can you check a non-available value? You must use the is.na() function.
Here is what you should have done (with just a bit of help from basic R tutorials):
df$exam.results.1[df$exam.results.1 == 1] <- "Pass"
df$exam.results.1[df$exam.results.1 == 0] <- "Fail"
df$exam[is.na(df$exam)] <- 0
To do this for multiple columns in one go you can use the following -
cols <- grep('result', names(df))
df[cols][is.na(df[cols])] <- 0
df[cols][df[cols] == 'Fail'] <- 0
df[cols][df[cols] == 'Pass'] <- 1
df
Assuming the name of the data frame is dt.
Make a vector for the names of result columns
result <- c("result exam 1", "result exam 2", "result exam 3")
dt <- dt %>% mutate_at(result, ~ifelse(.x == "Pass", 1, 0) )
This will replace all "pass" with 1 and rest of fail and NA with 0.
For NA s in other columns
dt[c("exam 1", "exam 2", "exam 3")][is.na(dt[c("exam 1", "exam 2", "exam 3")])] <- 0
For example, i have the dataset below where 1 = yes and 0 = no, and I need to figure out how many calls were made by landline that lasted under 10 minutes.
Image of example dataset
You can also specifically define the values you're looking for in each column when you're finding the sum. (This will help if you need count rows with values other than 1 in a column.)
sum(df$landline == 1 & df$`under 10 minutes` == 1)
We can use sum
sum(df1[, "under 10 minutes"])
If two columns are needed
colSums(df1[, c("landline", "under 10 minutes")])
If we are checking both columns, use rowSums
sum(rowSums(df1[, c("landline", "under 10 minutes")], na.rm = TRUE) == 2)
The grep function finds the rows where landline=1. We then only call those rows and sum the under 10 min column.
sum( df[ grep(1,df[,1]) ,4] )
R will conveniently treat 1 and 0 as if they mean TRUE and FALSE, so we can apply logical Boolean operations like AND (&) and OR (|) on them.
df <- data.frame(x = c(1, 0, 1, 0),
y = c(0, 0, 1, 1))
> sum(df$x & df$y)
[1] 1
> sum(df$x | df$y)
[1] 3
For future questions, you should look up how to use functions like dput or other ways to give an example data set instead of using an image.
I have two basic questions about new variable creation in R. I will show some code and hopefully someone can help answer these!
df0$new <- ifelse(df0$old=="yes",1,0)
In this code I am creating a new variable called "new" that is equal to 1 if the variable "old" is equal to yes or is otherwise equal to 0. But in the variable "old" I have missing data (represented as -99, -98, NAN). So how can I account for there being missing values?
The second question is about using an "OR" statement.
df0$z <- ifelse(df0$x1=="yes",1,0 | )
I want to create a new variable z that is equal to 1 if the participant responds "yes" to any of 5 questions (q1-q5). So I want to code it so it looks like: z = 1 if q1 ==1 OR q2 == 1 OR q3 == 1 OR q4 == 1 OR q5 == 1. If none of q1-q5 equal 1 than I want to set z equal to 0. However this also brings up the issue with the missing values as described up above. Thanks so much!
You could do something like the following.
First, get rid of the -99, -98 and NaN. I am assuming that when, in the question, you write NAN you are meaning NaN.
Encode NA values as NA.
is.na(df0$old) <- (df0$old %in% c(-99, -98)) | is.nan(df0$old)
Now, note that FALSE/TRUE are encoded as 0/1 and coerce the logical results to class integer.
df0$new <- as.integer(df0$old == "yes")
df0$z <- as.integer(q1 == "yes" | q2 == "yes" | q3 == "yes" | q4 == "yes" | q5 == "yes")
Another solution for the first part.
library(dplyr) #because of left_join function
df1 <- data.frame(old = c("yes", "no"), new = c(1, 0))
df0 <- left_join(df0, df1)
I am trying to create a subset of the rows that have a value of 1 for variable A, and a value of 1 for at least one of the following variables: B, C, or D.
Subset1 <- subset(Data,
Data$A==1 &
Data$B ==1 ||
Data$C ==1 |
Data$D == 1,
select= A)
Subset1
The problem is that the code above returns some rows that have A=0 and I am not sure why.
To troublehsoot:
I know that && and || are the long forms or and and or which vectorizes it.
I have run this code several times using &&, ||,& and | in different places. Nothing returns what I am looking for exactly.
When I shorten the code, it works fine and I subset only the rows that I would expect:
Subset1 <- subset(Data,
Data$A==1 &
Data$B==0,
select= A)
Subset1
Unfortunately, this doesn't suffice since I also need to capture rows whose C or D value = 1.
Can anyone explain why my first code block is not subsetting what I am expecting it to?
You can use parens to be more specific about what your & is referring to. Otherwise (as #Patrick Trentin clarified) your logical operators are combined according to operator precedence (within the same level of precedence they are evaluated from left to right).
Example:
> FALSE & TRUE | TRUE #equivalent to (FALSE & TRUE) | TRUE
[1] TRUE
> FALSE & (TRUE | TRUE)
[1] FALSE
So in your case you can try something like below (assuming you want items that A == 1 & that meet one of the other conditions):
Data$A==1 & (Data$B==1 | Data$C==1 | Data$D==1)
Since you didn't provide the data you're working with, I've replicated some here.
set.seed(20)
Data = data.frame(A = sample(0:1, 10, replace=TRUE),
B = sample(0:1, 10, replace=TRUE),
C = sample(0:1, 10, replace=TRUE),
D = sample(0:1, 10, replace=TRUE))
If you use parenthesis, which can evaluate to a logical function, you can achieve what you're looking for.
Subset1 <- subset(Data,
Data$A==1 &
(Data$B == 1 |
Data$C == 1 |
Data$D ==1),
select=A)
Subset1
A
1 1
2 1
4 1
5 1
If I am sub-setting using logical statements, is there a way of combining without using logical operators? i.e. is there a more effective way of doing the following:
train$TOD[train$Hour == 23 | train$Hour == 0 | train$Hour == 1 | train$Hour == 2]
With a reproducible example it could be great but I think that this code is what you are looking for:
train[train$Hour %in% c(0, 1, 2, 23), ]