R: Count frequencies of levels across whole time series - r

I've created this dummy dataframe that represents my real data. For simplicity, I've dropped the Time column:
df <- tibble(ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
n_0 = 4,
n_1 = 0,
n_2 = 0,
n_3 = 0,
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2))
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 4 0 0 0 0
4 3 2 4 0 0 0 0
5 4 1 4 0 0 0 0
6 1 2 4 0 0 0 1
7 2 3 4 0 0 0 0
8 3 0 4 0 0 0 2
So some words to explain this structure. The actual data comprises only the ID and level column. A specific ID can only have one level, however, this might change over time. All IDs start with level 0. Now I want columns that track how much of my IDs (here in total 4) have levels 0, 1, 2 and 3. Therefore I've already created the count columns. Also, I think a column with previous level might be helpful.
The following table shows the result I'm expecting:
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 3 1 0 0 0
4 3 2 2 1 1 0 0
5 4 1 1 2 1 0 0
6 1 2 1 1 2 0 1
7 2 3 0 1 2 1 0
8 3 0 1 1 1 1 2
Is there a sneaky way to do so in R?

You may try
fn <- function(df){
res <- as.data.frame(matrix(0, ncol = length(unique(df$level)), nrow = nrow(df)))
key <- factor(rep(0, length(unique(df$level))), levels = unique(df$level))
for (i in 1:nrow(df)){
if (df$level[i] != key[df$ID[i]]){
key[df$ID[i]] <- df$level[i]
res[i,] <- table(key)
} else {
res[i,] <- table(key)
}
}
names(res) <- paste0("n_",levels(key))
names(res)
df <- cbind(df, res)
return(df)
}
fn(df)
ID level previous_level n_0 n_1 n_2 n_3
1 1 0 0 4 0 0 0
2 2 0 0 4 0 0 0
3 1 1 0 3 1 0 0
4 3 2 0 2 1 1 0
5 4 1 0 1 2 1 0
6 1 2 1 1 1 2 0
7 2 3 0 0 1 2 1
8 3 0 2 1 1 1 1

library(dplyr)
library(margrittr)
n_states = 4L
state = vector(mode = 'numeric', length = n_states)
state[1L] = n_distinct(df$ID)
for (i in seq_len(nrow(df))) {
state[df[i, 'previous_level'] + 1] %<>% subtract(1)
state[df[i, 'level'] + 1] %<>% add(1)
df[i, paste0('n', seq_len(n_states) - 1L)] = state
}
# ID level previous_level n0 n1 n2 n3
# 1 1 0 0 4 0 0 0
# 2 2 0 0 4 0 0 0
# 3 1 1 0 3 1 0 0
# 4 3 2 0 2 1 1 0
# 5 4 1 0 1 2 1 0
# 6 1 2 1 1 1 2 0
# 7 2 3 0 0 1 2 1
# 8 3 0 2 1 1 1 1
Data:
df <- data.frame(
ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2)
)

Related

How can I create a new column with values 1/0, where the value in the new column is 1 only if values in two other columns are both 1?

I have two columns within a DF, "wet" and "cold", with values of 1 and 0 respectively, e.g:
Wet Cold
1 1
0 1
0 1
1 0
1 1
0 0
I would like to create a new column, wet&cold, where only if wet=1 and cold=1, then wet&cold=1. If any or both of them are 0 or not matching, then wet&cold=0.
I tried to work around with grepl, but without success.
Base R solution
df$`wet&cold` <- df$Wet*df$Cold
df
Wet Cold wet&cold
1 1 1 1
2 0 1 0
3 0 1 0
4 1 0 0
5 1 1 1
6 0 0 0
dplyr solution
df %>%
mutate(`wet&cold`=Wet*Cold)
Wet Cold wet&cold
1 1 1 1
2 0 1 0
3 0 1 0
4 1 0 0
5 1 1 1
6 0 0 0
Another option by checking I all row values have the value 1 for all the columns and convert the TRUE/FALSE to 1/0 with as.integer like this:
df$wet_cold = as.integer(rowSums(df == 1) == ncol(df))
df
#> Wet Cold wet_cold
#> 1 1 1 1
#> 2 0 1 0
#> 3 0 1 0
#> 4 1 0 0
#> 5 1 1 1
#> 6 0 0 0
Created on 2023-01-18 with reprex v2.0.2
Other solution works great with the clever multiplication. Here's perhaps a more general solution using ifelse(), which works well for this two case situation.
df <- data.frame(
wet = c(1, 0, 0, 1, 1, 0),
cold = c(1, 1, 1, 0, 1, 0)
)
df$wet_cold <- ifelse(df$wet == 1 & df$cold == 1, 1, 0)
df
# df
# wet cold wet_cold
# 1 1 1 1
# 2 0 1 0
# 3 0 1 0
# 4 1 0 0
# 5 1 1 1
# 6 0 0 0
You can use & to check if both are 1 and using + to convert TRUE or FLASE to 1 and 0.
DF["wet&cold"] <- +(DF$wet & DF$cold)
#DF
# wet cold wet&cold
#1 1 1 1
#2 0 1 0
#3 0 1 0
#4 1 0 0
#5 1 1 1
#6 0 0 0
Two more general approaches for more than two columns and also other conditions than 1 will be.
DF["wet&cold"] <- +(apply(DF==1, 1, all))
DF["wet&cold"] <- +(rowSums(DF != 1) == 0)
Data
DF <- data.frame(wet = c(1, 0, 0, 1, 1, 0), cold = c(1, 1, 1, 0, 1, 0))

R - Function that dichotomizes certain columns of a data frame based on different thresholds

I am trying to create a function that dichotomizes certain defined columns of a data frame based on different values depending on the column.
For example, in the following data frame with conditions A, B, C and D:
A <- c(0, 2, 1, 0, 2, 1, 0, 0, 1, 2)
B <- c(0, 1, 1, 1, 0, 0, 0, 1, 1, 0)
C <- c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
D <- c(0, 0, 3, 1, 2, 1, 4, 0, 3, 0)
Data <- data.frame(A, B, C, D)
I would like the function to dichotomize the conditions that I select [e.g. A, B, D] and dichotomize them based on thresholds that I assign [e.g. 2 for A, 1 for B, 3 for D].
I would like the dichotomized columns to be added to the data frame with different names [e.g. A_dich, B_dich, D_dich].
The final data frame should look like this (you will notice B is already dichotomized, which is fine, it should just be treated equally and added):
A B C D A_dicho B_dicho D_dicho
1 0 0 0 0 0 0 0
2 2 1 0 0 1 1 0
3 1 1 0 3 0 1 1
4 0 1 1 1 0 1 0
5 2 0 1 2 1 0 0
6 1 0 1 1 0 0 0
7 0 0 1 4 0 0 1
8 0 1 1 0 0 1 0
9 1 1 1 3 0 1 1
10 2 0 1 0 1 0 0
Could someone help me? Many thanks in advance.
Make a little threshold vector specifying the values, then Map it to the columns:
thresh <- c("A"=2, "B"=1, "D"=3)
Data[paste(names(thresh), "dicho", sep="_")] <- Map(
\(d,th) as.integer(d >= th), Data[names(thresh)], thresh
)
Data
## A B C D A_dicho B_dicho D_dicho
##1 0 0 0 0 0 0 0
##2 2 1 0 0 1 1 0
##3 1 1 0 3 0 1 1
##4 0 1 1 1 0 1 0
##5 2 0 1 2 1 0 0
##6 1 0 1 1 0 0 0
##7 0 0 1 4 0 0 1
##8 0 1 1 0 0 1 0
##9 1 1 1 3 0 1 1
##10 2 0 1 0 1 0 0

How to loop ifelse function through a grouped variable with dplyr

I'm trying to apply a rule for a group of IDs that, upon the first instance where the value for a variable in one row equals 1, all values for another variable in all subsequent rows in that group equal 1.
Essentially, here is what I am trying to do:
I have:
ID D
1 1
1 0
1 0
2 0
2 0
3 1
3 0
3 0
4 1
4 0
4 1
4 1
4 1
4 0
I want:
ID D PREV
1 1 0
1 0 1
1 0 1
2 0 0
2 0 0
3 1 0
3 0 1
3 0 1
4 1 0
4 0 1
4 1 1
4 1 1
4 0 1
I'm trying to use dplyr to iterate through a series of grouped rows, in each one applying an ifelse function. My code looks like this:
data$prev = 0
data <-
data %>%
group_by(id)%>%
mutate(prev = if_else(lag(prev) == 1 | lag(d) == 1, 1, 0))
But for some reason, this is not applying the ifelse function over the whole group, resulting in data that looks something like this:
ID D PREV
1 1 0
1 0 1
1 0 0
2 0 0
2 0 0
3 1 0
3 0 1
3 0 0
4 1 0
4 0 1
4 1 0
4 1 1
4 0 1
Can anyone help me with this?
What about this:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(prev = +(cumsum(c(0, D[-length(D)])) > 0)) %>%
ungroup()
#> # A tibble: 14 x 3
#> ID D prev
#> <int> <int> <int>
#> 1 1 1 0
#> 2 1 0 1
#> 3 1 0 1
#> 4 2 0 0
#> 5 2 0 0
#> 6 3 1 0
#> 7 3 0 1
#> 8 3 0 1
#> 9 4 1 0
#> 10 4 0 1
#> 11 4 1 1
#> 12 4 1 1
#> 13 4 1 1
#> 14 4 0 1
To explain what it does, let's just take a simple vector.
The calc will be the same for each group.
Be x our vector
x <- c(0,0,0,1,1,0,0,2,3,4)
Do the cumulative sum over x
cumsum(x)
#> [1] 0 0 0 1 2 2 2 4 7 11
You are interested only on value above zeros, therefore:
cumsum(x)>0
#> [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
You don't want logical, but numeric. Just a + makes the trick
+(cumsum(x)>0)
#> [1] 0 0 0 1 1 1 1 1 1 1
However, you want the 1s delayed by 1. Thus, we had a zero on top of x
+(cumsum(c(0,x))>0)
#> [1] 0 0 0 0 1 1 1 1 1 1 1
We need to keep the same length, so we remove the last value of x.
+(cumsum(c(0, x[-length(x)])) > 0)
#> [1] 0 0 0 0 1 1 1 1 1 1
And that makes the trick.
We can use lag
library(dplyr)
df %>%
group_by(ID) %>%
mutate(prev = lag(cumsum(D) > 0, default = 0))
-output
# A tibble: 14 x 3
# Groups: ID [4]
# ID D prev
# <dbl> <dbl> <dbl>
# 1 1 1 0
# 2 1 0 1
# 3 1 0 1
# 4 2 0 0
# 5 2 0 0
# 6 3 1 0
# 7 3 0 1
# 8 3 0 1
# 9 4 1 0
#10 4 0 1
#11 4 1 1
#12 4 1 1
#13 4 1 1
#14 4 0 1
data
df <- data.frame(
ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4),
D = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0)
)
You can use a new function from dplyr dplyr::group_modify to apply function over groups
df <- data.frame(
ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4),
D = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0)
)
df %>% group_by(ID) %>% group_modify(
function(x, y){
boo <- x[1, ]$D == 1
ifelse(boo,
{x$prev = 1
x$prev[1] = 0
},
{x$prev = 0})
x
}
)
# A tibble: 14 x 3
# Groups: ID [4]
ID D prev
<dbl> <dbl> <dbl>
1 1 1 0
2 1 0 1
3 1 0 1
4 2 0 0
5 2 0 0
6 3 1 0
7 3 0 1
8 3 0 1
9 4 1 0
10 4 0 1
11 4 1 1
12 4 1 1
13 4 1 1
14 4 0 1

How to construct this binary variable in R?

The aim is check if value at index i is 1 and then make the previous six entries as 1.
x <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1)
## Required output
y <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
## Attempt
for(j in seq_along(x)){
if(x[j] == 1){
for(i in (j-6):j)
x[i] = 1
}}
Could you help solve this or better approach ?
Thanks.
A fully vectorized solution using filter:
as.integer( #turn logical value into numbers
as.logical( #coerce to logical --> 0 becomes FALSE, everything else TRUE
rev( #reverse order
filter( #linear filtering
c(rep(0, 6), #pad with zeros in the beginning to avoid NAs
rev(x)), #revers order of input vector
c(rep(1, 7)), sides=1 #y_i = x_i * 1 + x_(i-1) * 1 +...+ x_(i-6) * 1
)[-(1:6)]))) #remove NA values
#[1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
You could try the following options (though don't forget to initialize x when trying each option as I'm overriding it)
indx <- mapply(function(x, y) x:y, which(x == 1) - 6 , which(x == 1))
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Or even simpler
indx <- sapply(which(x == 1) - 6, function(x) x:(x + 6))
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Or
indx <- apply(cbind(which(x == 1) - 6 , which(x == 1)), 1, function(x) x[1]:x[2])
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Or
indx <- seq_len(6)
indx <- sapply(which(x == 1), function(x) x - indx)
x[indx[indx > 0]] <- 1
x
## [1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Using 'for' loop:
ddf = data.frame(x,y=0)
for(i in 1:nrow(ddf)){
if(ddf[i,'x']==1){
j = i-5
if(j<1) j=1
ddf[j:i,'y'] = 1
}
}
ddf
x y
1 0 1
2 0 1
3 0 1
4 1 1
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 1
13 0 1
14 0 1
15 0 1
16 0 1
17 1 1
18 0 1
19 1 1
y = ddf$y
y
[1] 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
y<-x
y[unlist(sapply(which(x==1),
function(val){
val:(max(val-6,1))
}
)
)
]<-1
> y
[1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Explanation :
I first look for indices of x=1 with which(x==1). Then, for each of the indices I get the indices from the one with x=1 to the 6th before that with sapply(...) then I unlist the result to only have a vector of indices for which y must be 1.
I then assigned 1 to the corresponding y values.
another writing, in 2 steps :
y<-x
ind<-unlist(sapply(which(x==1),function(val){val:(max(val-6,1))}))
y[ind]<-1
> y
[1] 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

Creating special matrix in R

I have a matrix as follows.
dat = matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 2, 3, 4, 5, 6), ncol=4)
colnames(dat)=c("m1","m2","m3","m4")
dat
m1 m2 m3 m4
1 0 1 0 2
2 0 0 0 3
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
I would like to create four matrix(5*4) which each matrix column obtain by multiplying by itself and then each pair row values res1 = (m1*m1, m1*m2, m1*m3, m1*m4) , res2 = (m1*m2, m2*m2, m2*m3, m2*m4), res3 = (mm1*m3, m2*m3, m3*m3, m4*m3), res4 = (m1*m4, m2*m4, m3*m4, m4*m4) such as
res1
1 0 0 0 0
2 0 0 0 0
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
res2
1 1 1 0 2
2 0 0 0 0
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
res3
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 1 1 5
5 1 1 1 6
res4
1 0 2 0 4
2 0 0 0 9
3 4 4 0 16
4 5 5 5 25
5 6 6 6 36
How can I do it efficiently in R?
Running
res <- lapply(1:ncol(dat), function(i) dat * dat[,i])
will work thanks to the recycling of the element-wise multiplication. If you multiply by one column, those values will repeat over the entire matrix. And lapply will return them all in a list. You can get them out individually as res[[1]], res[[2]], etc.
test<-NULL
for (i in 1:ncol(dat)){
x<-dat*dat[,i]
test[i]<-list(x)
}
same as #Mrflick's comment
test[[2]]
m1 m2 m3 m4
[1,] 0 1 0 2
[2,] 0 0 0 0
[3,] 1 1 0 4
[4,] 1 1 1 5
[5,] 1 1 1 6

Resources