How to construct this binary variable in R? - r

The aim is check if value at index i is 1 and then make the previous six entries as 1.
x <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1)
## Required output
y <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
## Attempt
for(j in seq_along(x)){
if(x[j] == 1){
for(i in (j-6):j)
x[i] = 1
}}
Could you help solve this or better approach ?
Thanks.

A fully vectorized solution using filter:
as.integer( #turn logical value into numbers
as.logical( #coerce to logical --> 0 becomes FALSE, everything else TRUE
rev( #reverse order
filter( #linear filtering
c(rep(0, 6), #pad with zeros in the beginning to avoid NAs
rev(x)), #revers order of input vector
c(rep(1, 7)), sides=1 #y_i = x_i * 1 + x_(i-1) * 1 +...+ x_(i-6) * 1
)[-(1:6)]))) #remove NA values
#[1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

You could try the following options (though don't forget to initialize x when trying each option as I'm overriding it)
indx <- mapply(function(x, y) x:y, which(x == 1) - 6 , which(x == 1))
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Or even simpler
indx <- sapply(which(x == 1) - 6, function(x) x:(x + 6))
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Or
indx <- apply(cbind(which(x == 1) - 6 , which(x == 1)), 1, function(x) x[1]:x[2])
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Or
indx <- seq_len(6)
indx <- sapply(which(x == 1), function(x) x - indx)
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

Using 'for' loop:
ddf = data.frame(x,y=0)
for(i in 1:nrow(ddf)){
if(ddf[i,'x']==1){
j = i-5
if(j<1) j=1
ddf[j:i,'y'] = 1
}
}
ddf
x y
1 0 1
2 0 1
3 0 1
4 1 1
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 1
13 0 1
14 0 1
15 0 1
16 0 1
17 1 1
18 0 1
19 1 1
y = ddf$y
y
[1] 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

y<-x
y[unlist(sapply(which(x==1),
function(val){
val:(max(val-6,1))
}
)
)
]<-1
> y
[1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Explanation :
I first look for indices of x=1 with which(x==1). Then, for each of the indices I get the indices from the one with x=1 to the 6th before that with sapply(...) then I unlist the result to only have a vector of indices for which y must be 1.
I then assigned 1 to the corresponding y values.
another writing, in 2 steps :
y<-x
ind<-unlist(sapply(which(x==1),function(val){val:(max(val-6,1))}))
y[ind]<-1
> y
[1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

Related

How can I create a new column with values 1/0, where the value in the new column is 1 only if values in two other columns are both 1?

I have two columns within a DF, "wet" and "cold", with values of 1 and 0 respectively, e.g:
Wet Cold
1 1
0 1
0 1
1 0
1 1
0 0
I would like to create a new column, wet&cold, where only if wet=1 and cold=1, then wet&cold=1. If any or both of them are 0 or not matching, then wet&cold=0.
I tried to work around with grepl, but without success.
Base R solution
df$`wet&cold` <- df$Wet*df$Cold
df
Wet Cold wet&cold
1 1 1 1
2 0 1 0
3 0 1 0
4 1 0 0
5 1 1 1
6 0 0 0
dplyr solution
df %>%
mutate(`wet&cold`=Wet*Cold)
Wet Cold wet&cold
1 1 1 1
2 0 1 0
3 0 1 0
4 1 0 0
5 1 1 1
6 0 0 0
Another option by checking I all row values have the value 1 for all the columns and convert the TRUE/FALSE to 1/0 with as.integer like this:
df$wet_cold = as.integer(rowSums(df == 1) == ncol(df))
df
#> Wet Cold wet_cold
#> 1 1 1 1
#> 2 0 1 0
#> 3 0 1 0
#> 4 1 0 0
#> 5 1 1 1
#> 6 0 0 0
Created on 2023-01-18 with reprex v2.0.2
Other solution works great with the clever multiplication. Here's perhaps a more general solution using ifelse(), which works well for this two case situation.
df <- data.frame(
wet = c(1, 0, 0, 1, 1, 0),
cold = c(1, 1, 1, 0, 1, 0)
)
df$wet_cold <- ifelse(df$wet == 1 & df$cold == 1, 1, 0)
df
# df
# wet cold wet_cold
# 1 1 1 1
# 2 0 1 0
# 3 0 1 0
# 4 1 0 0
# 5 1 1 1
# 6 0 0 0
You can use & to check if both are 1 and using + to convert TRUE or FLASE to 1 and 0.
DF["wet&cold"] <- +(DF$wet & DF$cold)
#DF
# wet cold wet&cold
#1 1 1 1
#2 0 1 0
#3 0 1 0
#4 1 0 0
#5 1 1 1
#6 0 0 0
Two more general approaches for more than two columns and also other conditions than 1 will be.
DF["wet&cold"] <- +(apply(DF==1, 1, all))
DF["wet&cold"] <- +(rowSums(DF != 1) == 0)
Data
DF <- data.frame(wet = c(1, 0, 0, 1, 1, 0), cold = c(1, 1, 1, 0, 1, 0))

R: Count frequencies of levels across whole time series

I've created this dummy dataframe that represents my real data. For simplicity, I've dropped the Time column:
df <- tibble(ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
n_0 = 4,
n_1 = 0,
n_2 = 0,
n_3 = 0,
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2))
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 4 0 0 0 0
4 3 2 4 0 0 0 0
5 4 1 4 0 0 0 0
6 1 2 4 0 0 0 1
7 2 3 4 0 0 0 0
8 3 0 4 0 0 0 2
So some words to explain this structure. The actual data comprises only the ID and level column. A specific ID can only have one level, however, this might change over time. All IDs start with level 0. Now I want columns that track how much of my IDs (here in total 4) have levels 0, 1, 2 and 3. Therefore I've already created the count columns. Also, I think a column with previous level might be helpful.
The following table shows the result I'm expecting:
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 3 1 0 0 0
4 3 2 2 1 1 0 0
5 4 1 1 2 1 0 0
6 1 2 1 1 2 0 1
7 2 3 0 1 2 1 0
8 3 0 1 1 1 1 2
Is there a sneaky way to do so in R?
You may try
fn <- function(df){
res <- as.data.frame(matrix(0, ncol = length(unique(df$level)), nrow = nrow(df)))
key <- factor(rep(0, length(unique(df$level))), levels = unique(df$level))
for (i in 1:nrow(df)){
if (df$level[i] != key[df$ID[i]]){
key[df$ID[i]] <- df$level[i]
res[i,] <- table(key)
} else {
res[i,] <- table(key)
}
}
names(res) <- paste0("n_",levels(key))
names(res)
df <- cbind(df, res)
return(df)
}
fn(df)
ID level previous_level n_0 n_1 n_2 n_3
1 1 0 0 4 0 0 0
2 2 0 0 4 0 0 0
3 1 1 0 3 1 0 0
4 3 2 0 2 1 1 0
5 4 1 0 1 2 1 0
6 1 2 1 1 1 2 0
7 2 3 0 0 1 2 1
8 3 0 2 1 1 1 1
library(dplyr)
library(margrittr)
n_states = 4L
state = vector(mode = 'numeric', length = n_states)
state[1L] = n_distinct(df$ID)
for (i in seq_len(nrow(df))) {
state[df[i, 'previous_level'] + 1] %<>% subtract(1)
state[df[i, 'level'] + 1] %<>% add(1)
df[i, paste0('n', seq_len(n_states) - 1L)] = state
}
# ID level previous_level n0 n1 n2 n3
# 1 1 0 0 4 0 0 0
# 2 2 0 0 4 0 0 0
# 3 1 1 0 3 1 0 0
# 4 3 2 0 2 1 1 0
# 5 4 1 0 1 2 1 0
# 6 1 2 1 1 1 2 0
# 7 2 3 0 0 1 2 1
# 8 3 0 2 1 1 1 1
Data:
df <- data.frame(
ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2)
)

R - Function that dichotomizes certain columns of a data frame based on different thresholds

I am trying to create a function that dichotomizes certain defined columns of a data frame based on different values depending on the column.
For example, in the following data frame with conditions A, B, C and D:
A <- c(0, 2, 1, 0, 2, 1, 0, 0, 1, 2)
B <- c(0, 1, 1, 1, 0, 0, 0, 1, 1, 0)
C <- c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
D <- c(0, 0, 3, 1, 2, 1, 4, 0, 3, 0)
Data <- data.frame(A, B, C, D)
I would like the function to dichotomize the conditions that I select [e.g. A, B, D] and dichotomize them based on thresholds that I assign [e.g. 2 for A, 1 for B, 3 for D].
I would like the dichotomized columns to be added to the data frame with different names [e.g. A_dich, B_dich, D_dich].
The final data frame should look like this (you will notice B is already dichotomized, which is fine, it should just be treated equally and added):
A B C D A_dicho B_dicho D_dicho
1 0 0 0 0 0 0 0
2 2 1 0 0 1 1 0
3 1 1 0 3 0 1 1
4 0 1 1 1 0 1 0
5 2 0 1 2 1 0 0
6 1 0 1 1 0 0 0
7 0 0 1 4 0 0 1
8 0 1 1 0 0 1 0
9 1 1 1 3 0 1 1
10 2 0 1 0 1 0 0
Could someone help me? Many thanks in advance.
Make a little threshold vector specifying the values, then Map it to the columns:
thresh <- c("A"=2, "B"=1, "D"=3)
Data[paste(names(thresh), "dicho", sep="_")] <- Map(
\(d,th) as.integer(d >= th), Data[names(thresh)], thresh
)
Data
## A B C D A_dicho B_dicho D_dicho
##1 0 0 0 0 0 0 0
##2 2 1 0 0 1 1 0
##3 1 1 0 3 0 1 1
##4 0 1 1 1 0 1 0
##5 2 0 1 2 1 0 0
##6 1 0 1 1 0 0 0
##7 0 0 1 4 0 0 1
##8 0 1 1 0 0 1 0
##9 1 1 1 3 0 1 1
##10 2 0 1 0 1 0 0

Apply "or" function across any number of data.frame columns and preserve missingness

I create datasets in R regularly and often find I need to take two or more binary variables and "or" them into one new variable that indicates if any were 1, none were 1, or all were missing.
Simply using | does not handle NA's the way I would like.
So given a data.frame, df of three columns:
x = c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,NA,NA,NA,NA,NA,NA,NA,NA,NA)
y = c( 0, 0, 0, 1, 1, 1,NA,NA,NA, 0, 0, 0, 1, 1, 1,NA,NA,NA, 0, 0, 0, 1, 1, 1,NA,NA,NA)
z = c( 0, 1,NA, 0, 1,NA, 0, 1,NA, 0, 1,NA, 0, 1,NA, 0, 1,NA, 0, 1,NA, 0, 1,NA, 0, 1,NA)
df = data.frame(x,y,z)
The output I am looking for is:
myFunction(df)
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 NA
But simply using | does not handle 0's the way I am looking for as it prioritizes NA's over 0's:
as.numeric(df$x | df$y | df$z)
[1] 0 1 NA 1 1 1 NA 1 NA 1 1 1 1 1 1 1 1 1 NA 1 NA 1 1 1 NA 1 NA
This is the best solution I came up with:
myFunction <- function(...) {
as.numeric(apply(data.frame(...),1,function(x) { ifelse(all(is.na(x)),NA,sum(x,na.rm = T)) }) > 0)
}
df$xyz = myFunction(df)
df$xyz
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 NA
Is there a package with this functionality or a better way to write this so I don't have to copy paste this mess across all my scripts? Am I over thinking this?
We can use rowSums and convert to binary
df$new_col <- +(rowSums(df, na.rm = TRUE) > 0) * NA^(!rowSums(!is.na(df)))
-output
df$new_col
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 NA
It is also possible in a compact way if we use sum_ from hablar
library(hablar)
+(apply(df, 1, sum_) > 0)
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 NA
If you want your output as a new column in the dataframe:
dplyr::if_any is most helpful here. We can use if_any() to create a logical vector that outputs TRUE if any of the elements in the data is TRUE, rowwise. Then replace NAs with zeroes with coalesce.
library(dplyr)
df %>% mutate(new_col=coalesce(if_any(everything()), 0))
x y z new_col
1 0 0 0 0
2 0 0 1 1
3 0 0 NA 0
4 0 1 0 1
5 0 1 1 1
6 0 1 NA 1
7 0 NA 0 0
8 0 NA 1 1
9 0 NA NA 0
10 1 0 0 1
11 1 0 1 1
12 1 0 NA 1
13 1 1 0 1
14 1 1 1 1
15 1 1 NA 1
16 1 NA 0 1
17 1 NA 1 1
18 1 NA NA 1
19 NA 0 0 0
20 NA 0 1 1
21 NA 0 NA 0
22 NA 1 0 1
23 NA 1 1 1
24 NA 1 NA 1
25 NA NA 0 0
26 NA NA 1 1
27 NA NA NA 0
We use coalesce to replace NAs with 0s inside the mutate call, so the NAs from the original columns are preserved.
We can also use reduce( | ) to create the new column, then coerce to numeric with +.
library(dplyr)
library(purrr)
df %>% mutate(new_col = +(map_dfc(df, coalesce, 0) %>% reduce(`|`)))
Or just use the reduce(|) method first, then replace NAs with 0 with coalesce at the end:
library(dplyr)
library(purrr)
df %>% mutate(new_col = coalesce(reduce(., `|`), 0))
If you want just the vector, use:
coalesce(Reduce(`|`, df), 0)
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0
observation
For row-wise logical operations, if_any/if_all, reduce(|) and reduce(&), and rowSums(condition) are more robust then rowwise %>% max because max can`t handle rows with all NAs (will output Inf).
In case you want to have NAs as the output when all values are NAs for a given row
For that, just pipe the intermediate objects into replace...if_all...is.na..., as with the following code:
output<-df %>% mutate(new_col=coalesce(if_any(everything()), 0) %>%
replace(., if_all(everything(), is.na), NA))
output$new_col
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 NA
Another way that I thought of:
library(dplyr)
df %>%
rowwise() %>%
mutate(out = max(c_across(),na.rm = TRUE)) %>%
pull(out) %>%
replace(is.infinite(.), NA)
[1] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 NA

Replacing values in a list based on the number of consecutive values they area a part of

I have the following list:
my_list <- list(c(1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1), c(0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0), c(0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1))
> my_list
[[1]]
[1] 1 1 1 0 1 1 1 1 1 1 0 1 1
[[2]]
[1] 0 0 0 0 1 1 0 1 1 1 1 1 0
[[3]]
[1] 0 1 1 1 1 1 1 1 1 0 0 0 1
Wherever there are fewer than four consecutive 1s, I would like to replace those 1s with 0s. The resulting list should look like this:
> my_new_list
[[1]]
[1] 0 0 0 0 1 1 1 1 1 1 0 0 0
[[2]]
[1] 0 0 0 0 0 0 0 1 1 1 1 1 0
[[3]]
[1] 0 1 1 1 1 1 1 1 1 0 0 0 0
I believe I have to use the rle and inverse.rle functions, but I can't figure out how to do it. Thanks for your help.
For each list component use rle and replace each values element with 0 if its lengths is less than or equal to max_n, wich defaults to 3. Then perform the inverse of rle to get back the resulting vector.
replace_zeros <- function(x, max_n = 3) {
r <- rle(x)
r$values[r$lengths <= max_n] <- 0
inverse.rle(r)
}
lapply(my_list, replace_zeros)
giving:
[[1]]
[1] 0 0 0 0 1 1 1 1 1 1 0 0 0
[[2]]
[1] 0 0 0 0 0 0 0 1 1 1 1 1 0
[[3]]
[1] 0 1 1 1 1 1 1 1 1 0 0 0 0

Resources