As the title suggests my issue is the following. I have one variable identifying the beginning of an event and another variable indicating the end time of the same event. I want an variable indicating whether an event took place or not.
dat <-
data.frame(
"t" = c(1:10),
"id1" = c(1, NA, NA, NA, 2, 3, NA, NA, 4, NA),
"id2" = c(NA, 1, NA, NA, NA, 2, NA, 3, 4, NA),
"desiredoutcome" = c(1, 1, 0, 0, 1, 1, 1, 1,1, 0)
)
Here, the variable desired outcome would take value 1 whenever it is between the same value of id1 and id2. Consider e.g. row 6. it is both between id =2 and id = 3 and the dummy should hence be 1.
Any idea how I can achieve this?
How about this ?
#Position of non-NA index in `id1`
inds1 <- which(!is.na(dat$id1))
#Corresponding position of non-NA index in `id2`
inds2 <- match(dat$id1[inds1], dat$id2)
#Initialise the result column to 0
dat$result <- 0
#create a sequence between inds1 and inds2 and assign value as 1.
dat$result[unique(unlist(Map(seq, inds1, inds2)))] <- 1
dat
# t id1 id2 desiredoutcome result
#1 1 1 NA 1 1
#2 2 NA 1 1 1
#3 3 NA NA 0 0
#4 4 NA NA 0 0
#5 5 2 NA 1 1
#6 6 3 2 1 1
#7 7 NA NA 1 1
#8 8 NA 3 1 1
#9 9 4 4 1 1
#10 10 NA NA 0 0
Here is one way to do it,
First convert NA to 0,
dat[is.na(dat)] <- 0
Then we use ifelse
ifelse((dat$id1 == 0 & dat$id2 == 0), 0, 1)
In the form of dataframe,
dat = cbind(dat, ifelse((dat$id1 == 0 & dat$id2 == 0), 0, 1))
Related
This is somewhat similar to Like rleid but ignoring NAs, but I want NAs "ignored" in the counter (i.e., if we have NA, use NA in the counter). I need to initialize a counter that starts at 1 to count the occurrence of a number, keeps the previous counter if I have the same number as above, and restarts counter at 1 after any NA occurrence.
I have this:
# have
months <- c(1, 8, 1, 1, 1, NA, NA, 2, 6, NA)
# want
months_counter <- c(1, 2, 3, 3, 3, NA, NA, 1, 2, NA)
I have tried different ways using rleid but all of them seem to not have the functionality of ignoring NAs as above. Something to be applied in a data.table would be even more appreciated!
We can add a column counting NAs to use as a grouping column for rleid but only set the values on rows where months is not NA:
library(data.table)
dt = data.table(months = c(1, 8, 1, 1, 1, NA, NA, 2, 6, NA))
dt[, grouper := cumsum(is.na(months))][
!is.na(months),
result := rleid(months),
by = grouper
]
dt
# months grouper result
# 1: 1 0 1
# 2: 8 0 2
# 3: 1 0 3
# 4: 1 0 3
# 5: 1 0 3
# 6: NA 1 NA
# 7: NA 2 NA
# 8: 2 2 1
# 9: 6 2 2
# 10: NA 3 NA
I have a data set of boolean variables and I am trying to generate a new variable based on 3 of the existing booleans using ifelse().
The rules I'd like to implement are:
If any of the three columns have value 1, 1
If all of the three columns have value 0, 0
If all of the three columns have value NA, NA
If the three columns have some combination of 0 and NA, 0
Here is the code to generate a sample with 3 variables that I want to use to create a fourth:
df <- structure(list(var1 = c(NA, NA, NA, 0,1),
var2 = c(1, NA, 0,0, 1),
var3 = c(NA, NA, NA,0,1)), class = "data.frame", row.names = c(NA, -5L))
I have tried the following to generate the new variable according to my desired rules:
df$newvar1 <-ifelse(df$var1 == 1 | df$var2 == 1 |df$var3 == 1, 1,
ifelse((is.na(df$var1) & is.na(df$var2) & is.na(df$var3)), NA,0))
df$newvar2 <- ifelse((is.na(df$var1)|df$var1==0) &
(is.na(df$var2)|df$var2==0) &
(is.na(df$var3)|df$var3==0),0,
ifelse(df$var1 == 1 | df$var2 == 1 |df$var3 == 1, 1,
ifelse(is.na(df$var1) & is.na(df$var2) & is.na(df$var3), NA,NA)))
df$newvar3 <-ifelse(df$var1 == 1 | df$var2 == 1 |df$var3 == 1, 1,
ifelse((is.na(df$var1) & is.na(df$var2) & is.na(df$var3)), NA,
ifelse((is.na(df$var1)|df$var1==0) &
(is.na(df$var2)|df$var2==0) &
(is.na(df$var3)|df$var3==0),0,0)))
I don't understand why newvar1 and newvar3 have NA values corresponding to combinations of NAs and 0s when both examples use "&" between the na specifications (row 3 in the results).
I am assuming that NAs don't show up in newvar2 because the first ifelse() function takes precedent.
Any insight to the ifelse() function or advice on how to get the results I'm looking for would be really helpful.
Here is another possible option using rowSums:
df$newvar <- +(rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0) > 0)
# var1 var2 var3 newvar
#1 NA 1 NA 1
#2 NA NA NA NA
#3 NA 0 NA 0
#4 0 0 0 0
#5 1 1 1 1
This gives your expected results:
df$newvar <- 0
df$newvar[Reduce(`|`, lapply(df[1:3], `%in%`, 1))] <- 1
df$newvar[Reduce(`&`, lapply(df[1:3], is.na))] <- NA
df
# var1 var2 var3 newvar
# 1 NA 1 NA 1
# 2 NA NA NA NA
# 3 NA 0 NA 0
# 4 0 0 0 0
# 5 1 1 1 1
This defaults to 0 and only changes values with known conditions, which means that if there are any rows with NA and 1 (with or without 0), it will be assigned 0. It's not difficult to test for this, but it wasn't in your logic.
Suppose I have this example dataset df with only character variables.
dx_order1<-c(1, 1, NA, 1, 1)
dx_order2<-c(2, 2, 2, 2, NA)
Suppose that these variables are numeric.
I want to recode the variables. For dx_order1 variable, I want to recode 1 as 1 and 0 otherwise. Similarly, for dx_order 2 variable I want to recode 2 as 1 and 0 otherwise. Say that the new variables are called diag_order1 and diag_order2.
I know how to do this one by one in a manual fashion. The codes below will do the job:
df$diag_order1 <- ifelse(is.na(df$dx_order1), 0, 1)
df$diag_order1 <- ifelse(is.na(df$dx_order1), 0, 1)
I was wondering how I can achieve the same outcome with for loop function. If I have a a lot of similar variables then this type of manual coding is not practical. So any advice on how to have a loop to fasten the process would be appreciated.
You don't need to use loop in this instance, you could do this by converting NA to 0 using is.na. For example:
Data
df <- data.frame(dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
df[!is.na(df)] <- 1
df[is.na(df)] <- 0
Or if you have more columns with NA but only want to apply to certain columns then you could do it by specifying those columns:
df2 <- data.frame(letter_col = c(NA, letters[1:4]),
dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
# any columns starting with dx
cols <- names(df2)[grepl("^dx", names(df2))]
df2[, cols][!is.na(df2[, cols])] <- 1
df2[, cols][is.na(df2[, cols])] <- 0
You can use across with mutate in dplyr like this
library(dplyr)
df2 <- data.frame(letter_col = c(NA, letters[1:4]),
dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
> df2
letter_col dx_order1 dx_order2
1 <NA> 1 2
2 a 1 2
3 b NA 2
4 c 1 2
5 d 1 NA
df2 %>% mutate(across(starts_with("dx"), ~case_when(. == as.numeric(str_extract(cur_column(), "\\d$")) ~ 1,
is.na(.) ~ 0,
TRUE ~ 0), .names = "diag_{.col}"))
letter_col dx_order1 dx_order2 diag_dx_order1 diag_dx_order2
1 <NA> 1 2 1 1
2 a 1 2 1 1
3 b NA 2 0 1
4 c 1 2 1 1
5 d 1 NA 1 0
Assuming that your dx column can have values like suffix, NA and otherwise too as written in your question, and it recodes everything else than suffix to 0
You can coerce the logical vector from is.na to integer. is.na works with the dataframe.
df <- data.frame(dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
df[] <- +!is.na(df)
df
# dx_order1 dx_order2
#1 1 1
#2 1 1
#3 0 1
#4 1 1
#5 1 0
ORIGINAL QUESTION
I want to add a series of dummy variables in a data frame for each value of x in that data frame but containing an NA if another variable is NA. For example, suppose I have the below data frame:
x <- seq(1:5)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)
I am looking to produce:
var1 such that: z$var1 == 1 if x == 1, else if y == NA, z$var1 == NA, else z$var1 == 0.
var2 such that: z$var2 == 1 if x == 2, else if y == NA, z$var2 == NA, else z$var2 == 0.
var3 etc.
I can't seem to figure out how to vectorize this. I am looking for a solution that can be used for a large count of values of x.
UPDATE
There was some confusion that I wanted to iterate through each index of x. I am not looking for this, but rather for a solution that creates a variable for each unique value of x. When taking the below data as an input:
x <- c(1,1,2,3,9)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)
I am looking for z$var1, z$var2, z$var3, z$var9 where z$var1 <- c(1, 1, NA, 0, NA) and z$var2 <- c(NA, 0, 1, 0, NA). The original solution produces z$var1 <- z$var2 <- c(1,1,NA,0,NA).
You can use the ifelse which is vectorized to construct the variables:
cbind(z, setNames(data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))),
paste("var", unique(x), sep = "")))
x y var1 var2 var3 var9
1 1 NA 1 NA NA NA
2 1 1 1 0 0 0
3 2 NA NA 1 NA NA
4 3 0 0 0 1 0
5 9 NA NA NA NA 1
Update:
cbind(z, data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))))
x y X1 X2 X3 X4
1 1 NA 1 NA NA NA
2 1 1 1 0 0 0
3 2 NA NA 1 NA NA
4 3 0 0 0 1 0
5 9 NA NA NA NA 1
Example:
x <- c( 1, NA, 0, 1)
y <- c(NA, NA, 0, 1)
table(x,y, useNA="always") # --->
# y
# x 0 1 <NA>
# 0 1 0 0
# 1 0 1 1
# <NA> 0 0 1
My question is:
a <- c(NA, NA, NA, NA)
b <- c(1, 1, 1, 1)
table(a, b, useNA="always") ## --> It is 1X2 matrix.
# b
# a 1 <NA>
# <NA> 4 0
I want to get a 3X3 table with the same colnames, rownames and dimensions as the example above.. Then I will apply chisq.test for the table.
Thank you very much for your answers!
You can achieve this by converting both a and b into factors with the same levels. This works because factor vectors keep track of all possible values (aka levels) that their elements might take, even when they in fact contain just a subset of those.
a <- c(NA, NA, NA, NA)
b <- c(1, 1, 1, 1)
levs <- c(0, 1)
table(a = factor(a, levels = levs),
b = factor(b, levels = levs),
useNA = "always")
# b
# a 0 1 <NA>
# 0 0 0 0
# 1 0 0 0
# <NA> 0 4 0