Conditionally replace all values based on value of 1 column in R - r

Background
The data set is given below for reproducibility
data <- structure(list(rest1 = c(1, 1, 0, 1, 1, 1, 0, 1, 0, 1),
rest2 = c(1, 0, 1, 0, 0, 1, 1, 0, 0, 0),
rest3 = c(1, 0, 0, 0, 0, 1, 0, 1, 0, 0),
rest4 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0),
rest5 = c(1, 1, 0, 0, 0, 1, 0, 1, 0, 1),
rest6 = c(0, 0, 1, 0, 0, 0, 1, 0, 1, 0)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
The output is given below:
A tibble: 10 x 6
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 1 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 1 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0
My question
Based on the values of column sleep 6, there needs to be changes made. Given the variable rest6 is equal to 1, the other variables rest1-rest5 need to be changed to 0. Here, variables 3 and 7 need to be fixed.
The desired output is below:
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 0 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 0 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0
Previous Attempts
I have attempted to do so using my basic knowledge of R. My logic is if rest6 is equal to 1 and the observations are equal to 1, then set to 0, else we return the original value. However, this has not worked and I am a little unsure/not as proficient in R as of deliberate.
data <- ifelse(data$rest6 == 1 & data[,c(2:5) == 1],
0,
data[,c(2:6)])
Another attempt I have tried to use a function() to identify where to place the values.
Thank you for your help.

A simple base R solution may be to isolate all those in which rest6 == 1 and change all values in the relevant columns to 0:
data[data$rest6 %in% 1, 1:5] <- 0
Output:
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 0 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 0 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0

In tidyverse, a simple solution would be to loop across columns rest1 to rest5, and use case_when to replace the values that correspond to 1 in rest6 to 0
library(dplyr)
data <- data %>%
mutate(across(rest1:rest5,
~ case_when(rest6 == 1 ~ 0, TRUE ~ .x)))
-output
data
# A tibble: 10 × 6
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 0 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 0 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0

data.table solution
library(data.table)
setDT(data)
data[rest6 == 1, 1:5 := 0]

Related

R: Count frequencies of levels across whole time series

I've created this dummy dataframe that represents my real data. For simplicity, I've dropped the Time column:
df <- tibble(ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
n_0 = 4,
n_1 = 0,
n_2 = 0,
n_3 = 0,
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2))
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 4 0 0 0 0
4 3 2 4 0 0 0 0
5 4 1 4 0 0 0 0
6 1 2 4 0 0 0 1
7 2 3 4 0 0 0 0
8 3 0 4 0 0 0 2
So some words to explain this structure. The actual data comprises only the ID and level column. A specific ID can only have one level, however, this might change over time. All IDs start with level 0. Now I want columns that track how much of my IDs (here in total 4) have levels 0, 1, 2 and 3. Therefore I've already created the count columns. Also, I think a column with previous level might be helpful.
The following table shows the result I'm expecting:
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 3 1 0 0 0
4 3 2 2 1 1 0 0
5 4 1 1 2 1 0 0
6 1 2 1 1 2 0 1
7 2 3 0 1 2 1 0
8 3 0 1 1 1 1 2
Is there a sneaky way to do so in R?
You may try
fn <- function(df){
res <- as.data.frame(matrix(0, ncol = length(unique(df$level)), nrow = nrow(df)))
key <- factor(rep(0, length(unique(df$level))), levels = unique(df$level))
for (i in 1:nrow(df)){
if (df$level[i] != key[df$ID[i]]){
key[df$ID[i]] <- df$level[i]
res[i,] <- table(key)
} else {
res[i,] <- table(key)
}
}
names(res) <- paste0("n_",levels(key))
names(res)
df <- cbind(df, res)
return(df)
}
fn(df)
ID level previous_level n_0 n_1 n_2 n_3
1 1 0 0 4 0 0 0
2 2 0 0 4 0 0 0
3 1 1 0 3 1 0 0
4 3 2 0 2 1 1 0
5 4 1 0 1 2 1 0
6 1 2 1 1 1 2 0
7 2 3 0 0 1 2 1
8 3 0 2 1 1 1 1
library(dplyr)
library(margrittr)
n_states = 4L
state = vector(mode = 'numeric', length = n_states)
state[1L] = n_distinct(df$ID)
for (i in seq_len(nrow(df))) {
state[df[i, 'previous_level'] + 1] %<>% subtract(1)
state[df[i, 'level'] + 1] %<>% add(1)
df[i, paste0('n', seq_len(n_states) - 1L)] = state
}
# ID level previous_level n0 n1 n2 n3
# 1 1 0 0 4 0 0 0
# 2 2 0 0 4 0 0 0
# 3 1 1 0 3 1 0 0
# 4 3 2 0 2 1 1 0
# 5 4 1 0 1 2 1 0
# 6 1 2 1 1 1 2 0
# 7 2 3 0 0 1 2 1
# 8 3 0 2 1 1 1 1
Data:
df <- data.frame(
ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2)
)

R - Function that dichotomizes certain columns of a data frame based on different thresholds

I am trying to create a function that dichotomizes certain defined columns of a data frame based on different values depending on the column.
For example, in the following data frame with conditions A, B, C and D:
A <- c(0, 2, 1, 0, 2, 1, 0, 0, 1, 2)
B <- c(0, 1, 1, 1, 0, 0, 0, 1, 1, 0)
C <- c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
D <- c(0, 0, 3, 1, 2, 1, 4, 0, 3, 0)
Data <- data.frame(A, B, C, D)
I would like the function to dichotomize the conditions that I select [e.g. A, B, D] and dichotomize them based on thresholds that I assign [e.g. 2 for A, 1 for B, 3 for D].
I would like the dichotomized columns to be added to the data frame with different names [e.g. A_dich, B_dich, D_dich].
The final data frame should look like this (you will notice B is already dichotomized, which is fine, it should just be treated equally and added):
A B C D A_dicho B_dicho D_dicho
1 0 0 0 0 0 0 0
2 2 1 0 0 1 1 0
3 1 1 0 3 0 1 1
4 0 1 1 1 0 1 0
5 2 0 1 2 1 0 0
6 1 0 1 1 0 0 0
7 0 0 1 4 0 0 1
8 0 1 1 0 0 1 0
9 1 1 1 3 0 1 1
10 2 0 1 0 1 0 0
Could someone help me? Many thanks in advance.
Make a little threshold vector specifying the values, then Map it to the columns:
thresh <- c("A"=2, "B"=1, "D"=3)
Data[paste(names(thresh), "dicho", sep="_")] <- Map(
\(d,th) as.integer(d >= th), Data[names(thresh)], thresh
)
Data
## A B C D A_dicho B_dicho D_dicho
##1 0 0 0 0 0 0 0
##2 2 1 0 0 1 1 0
##3 1 1 0 3 0 1 1
##4 0 1 1 1 0 1 0
##5 2 0 1 2 1 0 0
##6 1 0 1 1 0 0 0
##7 0 0 1 4 0 0 1
##8 0 1 1 0 0 1 0
##9 1 1 1 3 0 1 1
##10 2 0 1 0 1 0 0

Convert from long to wide format from categorical data

Having categorical data like this:
data.frame(id = c(1,2,3,4,5), stock1 = c(1,2,0,1,2), stock2 = c(0,1,0,1,1), end = c(0,1,3,0,3), start = c(2,3,0,1,0))
id stock1 stock2 end start
1 1 1 0 0 2
2 2 2 1 1 3
3 3 0 0 3 0
4 4 1 1 0 1
5 5 2 1 3 0
How is it possible to convert them from long to wide format in which every column will show if exist or not with specific name?
Example of expected output:
data.frame(id = c(1,2,3,4,5), stock1_0 = c(0,0,1,0,0), stock1_1 = c(1,0,0,1,0), stock1_2 = c(0,1,0,0,1), stock2_0 = c(1,0,1,0,0), stock2_1 = c(0,1,0,0,0), end_0 = c(1,0,0,1,0), end_1 = c(0,1,0,0,0), end_3 = c(0,0,1,0,1), start_0 = c(0,0,1,0,1), start_1 = c(0,0,0,1,0), start_2 = c(1,0,0,0,0), start_3 = c(0,1,0,0,0))
id stock1_0 stock1_1 stock1_2 stock2_0 stock2_1 end_0 end_1 end_3 start_0 start_1 start_2 start_3
1 1 0 1 0 1 0 1 0 0 0 0 1 0
2 2 0 0 1 0 1 0 1 0 0 0 0 1
3 3 1 0 0 1 0 0 0 1 1 0 0 0
4 4 0 1 0 0 0 1 0 0 0 1 0 0
5 5 0 0 1 0 0 0 0 1 1 0 0 0
You could use model.matrix.
data.frame(dat[1],
do.call(cbind, lapply(seq(dat)[-1], function(x)
`colnames<-`(m <- model.matrix( ~ as.factor(dat[[x]]) - 1),
paste(names(dat[x]), seq_len(ncol(m)), sep="_")))))
# id stock1_1 stock1_2 stock1_3 stock2_1 stock2_2 end_1 end_2 end_3 start_1
# 1 1 0 1 0 1 0 1 0 0 0
# 2 2 0 0 1 0 1 0 1 0 0
# 3 3 1 0 0 1 0 0 0 1 1
# 4 4 0 1 0 0 1 1 0 0 0
# 5 5 0 0 1 0 1 0 0 1 1
# start_2 start_3 start_4
# 1 0 1 0
# 2 0 0 1
# 3 0 0 0
# 4 1 0 0
# 5 0 0 0
Data:
dat <- structure(list(id = c(1, 2, 3, 4, 5), stock1 = c(1, 2, 0, 1,
2), stock2 = c(0, 1, 0, 1, 1), end = c(0, 1, 3, 0, 3), start = c(2,
3, 0, 1, 0)), class = "data.frame", row.names = c(NA, -5L))
library(data.table)
setDT(df)
dcast(melt(df, 'id'),
id ~ paste0(variable, '_', value),
fun.aggregate = length)
# id end_0 end_1 end_3 start_0 start_1 start_2 start_3 stock1_0
# 1: 1 1 0 0 0 0 1 0 0
# 2: 2 0 1 0 0 0 0 1 0
# 3: 3 0 0 1 1 0 0 0 1
# 4: 4 1 0 0 0 1 0 0 0
# 5: 5 0 0 1 1 0 0 0 0
# stock1_1 stock1_2 stock2_0 stock2_1
# 1: 1 0 1 0
# 2: 0 1 0 1
# 3: 0 0 1 0
# 4: 1 0 0 1
# 5: 0 1 0 1
One way would be to get data in long format, combine column name with value and get the data back in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -id) %>%
unite(name, name, value) %>%
mutate(value = 1) %>%
pivot_wider(values_fill = list(value = 0))
# A tibble: 5 x 13
# id stock1_1 stock2_0 end_0 start_2 stock1_2 stock2_1 end_1 start_3 stock1_0 end_3 start_0 start_1
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 1 1 0 0 0 0 0 0 0 0
#2 2 0 0 0 0 1 1 1 1 0 0 0 0
#3 3 0 1 0 0 0 0 0 0 1 1 1 0
#4 4 1 0 1 0 0 1 0 0 0 0 0 1
#5 5 0 0 0 0 1 1 0 0 0 1 1 0

Collecting same answer from different questions in one variable?

I am completely new to R, but running out of time here.
In my dataset, I have people from several countries answering who they voted for last. People from different countries got different questions, so in each column, only the ones from the country have an answer, the rest is NA.
I am trying to collect everyone who voted for a green party in one variable. So far I have succeeded in coding it into a separate dummy variable for each country using ifelse, but I cant seem to merge these variables. So now I have ie a variable for Germany, where a green vote in the german election is 1, and everyone else is 0. Same goes for France etc.
But how can I collect all this information in just one variable?
Appreciate your help.
Assuming your data set looks like this...
> ctry <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
> vote_ctry_1 <- c(1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
> vote_ctry_2 <- c(0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0)
> vote_ctry_3 <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)
>
> dd <- data.frame(ctry, vote_ctry_1, vote_ctry_2, vote_ctry_3)
> dd
ctry vote_ctry_1 vote_ctry_2 vote_ctry_3
1 1 1 0 0
2 1 0 0 0
3 1 0 0 0
4 1 1 0 0
5 2 0 1 0
6 2 0 1 0
7 2 0 0 0
8 2 0 1 0
9 3 0 0 1
10 3 0 0 0
11 3 0 0 0
12 3 0 0 0
... then just add up the dummy variables:
> dd$vote_all <- vote_ctry_1 + vote_ctry_2 + vote_ctry_3
> dd
ctry vote_ctry_1 vote_ctry_2 vote_ctry_3 vote_all
1 1 1 0 0 1
2 1 0 0 0 0
3 1 0 0 0 0
4 1 1 0 0 1
5 2 0 1 0 1
6 2 0 1 0 1
7 2 0 0 0 0
8 2 0 1 0 1
9 3 0 0 1 1
10 3 0 0 0 0
11 3 0 0 0 0
12 3 0 0 0 0

Replacing values in a list based on the number of consecutive values they area a part of

I have the following list:
my_list <- list(c(1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1), c(0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0), c(0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1))
> my_list
[[1]]
[1] 1 1 1 0 1 1 1 1 1 1 0 1 1
[[2]]
[1] 0 0 0 0 1 1 0 1 1 1 1 1 0
[[3]]
[1] 0 1 1 1 1 1 1 1 1 0 0 0 1
Wherever there are fewer than four consecutive 1s, I would like to replace those 1s with 0s. The resulting list should look like this:
> my_new_list
[[1]]
[1] 0 0 0 0 1 1 1 1 1 1 0 0 0
[[2]]
[1] 0 0 0 0 0 0 0 1 1 1 1 1 0
[[3]]
[1] 0 1 1 1 1 1 1 1 1 0 0 0 0
I believe I have to use the rle and inverse.rle functions, but I can't figure out how to do it. Thanks for your help.
For each list component use rle and replace each values element with 0 if its lengths is less than or equal to max_n, wich defaults to 3. Then perform the inverse of rle to get back the resulting vector.
replace_zeros <- function(x, max_n = 3) {
r <- rle(x)
r$values[r$lengths <= max_n] <- 0
inverse.rle(r)
}
lapply(my_list, replace_zeros)
giving:
[[1]]
[1] 0 0 0 0 1 1 1 1 1 1 0 0 0
[[2]]
[1] 0 0 0 0 0 0 0 1 1 1 1 1 0
[[3]]
[1] 0 1 1 1 1 1 1 1 1 0 0 0 0

Resources