I have two basic questions about new variable creation in R. I will show some code and hopefully someone can help answer these!
df0$new <- ifelse(df0$old=="yes",1,0)
In this code I am creating a new variable called "new" that is equal to 1 if the variable "old" is equal to yes or is otherwise equal to 0. But in the variable "old" I have missing data (represented as -99, -98, NAN). So how can I account for there being missing values?
The second question is about using an "OR" statement.
df0$z <- ifelse(df0$x1=="yes",1,0 | )
I want to create a new variable z that is equal to 1 if the participant responds "yes" to any of 5 questions (q1-q5). So I want to code it so it looks like: z = 1 if q1 ==1 OR q2 == 1 OR q3 == 1 OR q4 == 1 OR q5 == 1. If none of q1-q5 equal 1 than I want to set z equal to 0. However this also brings up the issue with the missing values as described up above. Thanks so much!
You could do something like the following.
First, get rid of the -99, -98 and NaN. I am assuming that when, in the question, you write NAN you are meaning NaN.
Encode NA values as NA.
is.na(df0$old) <- (df0$old %in% c(-99, -98)) | is.nan(df0$old)
Now, note that FALSE/TRUE are encoded as 0/1 and coerce the logical results to class integer.
df0$new <- as.integer(df0$old == "yes")
df0$z <- as.integer(q1 == "yes" | q2 == "yes" | q3 == "yes" | q4 == "yes" | q5 == "yes")
Another solution for the first part.
library(dplyr) #because of left_join function
df1 <- data.frame(old = c("yes", "no"), new = c(1, 0))
df0 <- left_join(df0, df1)
Related
I have 6 columns in my data frame the column names are exam 1, exam 2, exam 3, result exam 1, result exam 2, result exam 3 respectively the first three columns have numbers and NAs and the last three columns have Pass and Fail and NAs. I want to replace all the NAs with 0s and I want to replace instead of all the Pass words with 1s and instead of all the Fail words with 0s. So I want to replace the Fail with zeros and the NAs also with zeros.
I have used multiple approaches in R but I can't make it work.
df[df == 'NA'] <- 0 , df[df == NA] <- 0
df[df$"result exam 1" == "Pass",]$"result exam 1" = 1
df[df$"result exam 1" == "Fail",]$"result exam 1" = 0
None of these codes are working.
Would someone be able to please help with this problem?
Thank you
You really need to get a better grasp of basic R syntax:
You are putting the subset operator [ in the wrong place (you need to subset your vector, not the data frame)
You are then using the $ operator on the result of the previous operation, and that throws an error (that you should have posted) because $ cannot be used on vectors.
You are testing a value for missingness: x == NA has no sense: how can you check a non-available value? You must use the is.na() function.
Here is what you should have done (with just a bit of help from basic R tutorials):
df$exam.results.1[df$exam.results.1 == 1] <- "Pass"
df$exam.results.1[df$exam.results.1 == 0] <- "Fail"
df$exam[is.na(df$exam)] <- 0
To do this for multiple columns in one go you can use the following -
cols <- grep('result', names(df))
df[cols][is.na(df[cols])] <- 0
df[cols][df[cols] == 'Fail'] <- 0
df[cols][df[cols] == 'Pass'] <- 1
df
Assuming the name of the data frame is dt.
Make a vector for the names of result columns
result <- c("result exam 1", "result exam 2", "result exam 3")
dt <- dt %>% mutate_at(result, ~ifelse(.x == "Pass", 1, 0) )
This will replace all "pass" with 1 and rest of fail and NA with 0.
For NA s in other columns
dt[c("exam 1", "exam 2", "exam 3")][is.na(dt[c("exam 1", "exam 2", "exam 3")])] <- 0
I would appreciate any help with this error I am getting in my code for a research project I am working on in R:
I am trying to create a column (named non_political) in a data frame (named privacy, imported from an sav file) representing survey data where:
1 signifies that the respondent was non-political (answered in a non-political way to some questions) and
0 signifies the opposite.
So far, I have written:
privacy$non_political<-NA
privacy$non_political<-ifelse(((privacy$q19 == 3) | (privacy$q19 == 4) | (privacy$q20 == 2) | (privacy$q21==2)), 1, 0)
but
head(privacy$non_political)
returns NA's along with 1's, which means that the 0 option is never executed in the ifelse command.
What could I be doing wrong here?
Thank you!
Using == will generate NA's as output if you have NA in your data. You can use %in% which will return FALSE when comparing with NA.
privacy$non_political<- with(privacy,
ifelse(q19 %in% c(3, 4) | q20 %in% 2 | q21 %in% 2, 1, 0))
You can do this without ifelse as well.
privacy$non_political<- with(privacy,
as.integer(q19 %in% c(3, 4) | q20 %in% 2 | q21 %in% 2))
I have one dataset, called unchf_data, it contains participants where some have answered a questionnaire and some have not. I want to make a descriptive table (table 1) where I have the columns (Total, No answer, Answer).
Making new column subgroup, for a table 1. # if PREG column answered "Yes" or 1, then in final cohort.
`unchf_data2 <- unchf_data2[!(unchf_data2$PREG == "0"),]`# removing all where pregnancy is 0. If no pregnancy then no chance of having an APO.
unchf_data2<- unchf_data2[!c(is.na(unchf_data2$HYPTPREG) == "TRUE" &
is.na(unchf_data2$DIABPREG) == "TRUE" &
is.na(unchf_data2$BABYLT5LBS) == "TRUE" &
is.na(unchf_data2$BORN3WKSERLY) == "TRUE" &
is.na(unchf_data2$PREECLMP) == "TRUE" &
is.na(unchf_data2$BABYGT9LBS) == "TRUE"),] # removing all that have missing in all APO variables.
`unchf_data2$subgroup <- 1` # creating new variable - subgroup which has 1 in all
unchf_data2$subgroup <- factor(unchf_data2$subgroup, levels = c(1))
`subgroup2 <- unchf_data2$subgroup` # making the new vector from subgroup variable
library(qpcR)
unchf_data <- qpcR:::cbind.na(subgroup2, unchf_data) # Binds our new vector with length of 10375 to our data set with length of 44174, filling in NA values for everything longer than the vector.
unchf_data$subgroup2[is.na(unchf_data$subgroup2) == "TRUE"] <- 0 # recoding all NA values to 0
unchf_data$subgroup2 <- factor(unchf_data$subgroup2, levels = c(0,1), labels = c("No answer", "Answer")) # converting variable to factor.
summary(unchf_data$subgroup2) # No answer 33799, Answer 10375
My problem now is that apparently the new subgroup2 variable does not line up with the original dataset.
Is there another way I can do this?
The reason I want a new variable in the dataset is because I'm using the UnivariateTable function from the Publish Package
I have a simple problem:
I have a column with thousands of values and I'm trying to convert it into a dichotomous variable (Yes|No). Replacing strings with 'No' was easy enough as the value I was converting was a single asterisk
Data$Complete <- gsub("\\*", "No", Data$Complete)
But when I attempt to replace everything apart from 'No', the following code replaces everything with 'Yes' in my string. I don't understand why it would as I'm specifying to replace everthing apart from "No":
Data$Complete <- Data[!Data$Complete %in% c("No"), "Complete"] <- "Yes"
Any pointers would be appreciated.
You can use combination of ifelse function and grepl to extract necessary data as below:
library(stringi)
# data simulation
set.seed(123)
n <- 1000
data <- data.frame(
complete = stri_rand_strings(n = n, length = 20, pattern = "[A-Za-z0-9\\*]")
)
# string matching
data$yes_no <- ifelse(grepl("\\*", data$complete), "No", "Yes")
head(data)
Output:
complete yes_no
1 HmOsw1WtXRxRfZ5tE1Jx Yes
2 tgdzehXaH8xtgn0TkCJD Yes
3 7PPM87DSFr1Qn6YC7ktM Yes
4 e4NGoRoonQkch*SCMbL6 No
5 EfPm5QztsA7eKeJAm4SV Yes
6 aJTxTtubO8vH2wi7XxZO Yes
I am trying to create a subset of the rows that have a value of 1 for variable A, and a value of 1 for at least one of the following variables: B, C, or D.
Subset1 <- subset(Data,
Data$A==1 &
Data$B ==1 ||
Data$C ==1 |
Data$D == 1,
select= A)
Subset1
The problem is that the code above returns some rows that have A=0 and I am not sure why.
To troublehsoot:
I know that && and || are the long forms or and and or which vectorizes it.
I have run this code several times using &&, ||,& and | in different places. Nothing returns what I am looking for exactly.
When I shorten the code, it works fine and I subset only the rows that I would expect:
Subset1 <- subset(Data,
Data$A==1 &
Data$B==0,
select= A)
Subset1
Unfortunately, this doesn't suffice since I also need to capture rows whose C or D value = 1.
Can anyone explain why my first code block is not subsetting what I am expecting it to?
You can use parens to be more specific about what your & is referring to. Otherwise (as #Patrick Trentin clarified) your logical operators are combined according to operator precedence (within the same level of precedence they are evaluated from left to right).
Example:
> FALSE & TRUE | TRUE #equivalent to (FALSE & TRUE) | TRUE
[1] TRUE
> FALSE & (TRUE | TRUE)
[1] FALSE
So in your case you can try something like below (assuming you want items that A == 1 & that meet one of the other conditions):
Data$A==1 & (Data$B==1 | Data$C==1 | Data$D==1)
Since you didn't provide the data you're working with, I've replicated some here.
set.seed(20)
Data = data.frame(A = sample(0:1, 10, replace=TRUE),
B = sample(0:1, 10, replace=TRUE),
C = sample(0:1, 10, replace=TRUE),
D = sample(0:1, 10, replace=TRUE))
If you use parenthesis, which can evaluate to a logical function, you can achieve what you're looking for.
Subset1 <- subset(Data,
Data$A==1 &
(Data$B == 1 |
Data$C == 1 |
Data$D ==1),
select=A)
Subset1
A
1 1
2 1
4 1
5 1