I am working on panel data that looks like this:
d <- data.frame(id = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c"),
time = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
iz = c(0,1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1))
id time iz
1 a 1 0
2 a 2 1
3 a 3 1
4 a 4 0
5 a 5 0
6 b 1 0
7 b 2 0
8 b 3 0
9 b 4 0
10 b 5 1
11 c 1 0
12 c 2 0
13 c 3 0
14 c 4 1
15 c 5 1
Here iz is an indicator for an event or a treatment (iz = 1). What I need is a variable that counts the periods before and after an event or the distance to and from an event. This variable would look like this:
id time iz nvar
1 a 1 0 -1
2 a 2 1 0
3 a 3 1 0
4 a 4 0 1
5 a 5 0 2
6 b 1 0 -4
7 b 2 0 -3
8 b 3 0 -2
9 b 4 0 -1
10 b 5 1 0
11 c 1 0 -1
12 c 2 0 -2
13 c 3 0 -3
14 c 4 1 0
15 c 5 1 0
I have tried working with the answers given here and here but can't make it work in my case.
I would really appreciate any ideas how to approach this problem. Thank you in advance for all ideas and suggestions.
1) rleid This code applies rleid from data.table to each id and then generates a negative reverse sequence if that produces a run of 1's and a forward sequence otherwise, i.e. we assume that a forward positive sequence should be used except before the first run of ones. For the 1's in iz zero that out. There can be any number of runs in an id and it also supports id's with only 0's or only 1's. It assumes that time has no gaps.
library(data.table)
Seq <- function(x, s = seq_along(x)) if (x[1] == 1) -rev(s) else s
nvar <- function(iz, r = rleid(iz)) ave((1-iz) * r, r, FUN = Seq)
transform(d, nvar = (1-iz) * ave(iz, id, FUN = nvar))
giving:
id time iz nvar
1 a 1 0 -1
2 a 2 1 0
3 a 3 1 0
4 a 4 0 1
5 a 5 0 2
6 b 1 0 -4
7 b 2 0 -3
8 b 3 0 -2
9 b 4 0 -1
10 b 5 1 0
11 c 1 0 -3
12 c 2 0 -2
13 c 3 0 -1
14 c 4 1 0
15 c 5 1 0
2) base This code uses only base R. It assumes that every id has at most one run of ones. There is no restriction on whether there are any zeros. Also it supports gaps in time. It applies nvar to the row numbers of each id. First it calculates the range rng of the times of the ones and then calculates the signed distance in the last line of nvar. The output is identical to that shown in (1). If we could assume that every id has exactly one run of 1's the if statement could be omitted.
nvar <- function(ix) with(d[ix, ], {
if (all(iz == 0)) return(iz)
rng <- range(time[iz == 1])
(time < rng[1]) * (time - rng[1]) + (time > rng[2]) * (time - rng[2])
})
transform(d, nvar = ave(1:nrow(d), id, FUN = nvar))
2a) This variation of (2) passes time and iz to nvar by encoding them as the real and imaginary parts of a complex vector in order to avoid having to deal with row numbers but it is otherwise the same as (2). We have omitted the if statement in (2) but it could be added back in if any id's have no ones.
nvar <- function(x, time = Re(x), iz = Im(x), rng = range(time[iz == 1]))
(time < rng[1]) * (time - rng[1]) + (time > rng[2]) * (time - rng[2])
transform(d, nvar = Re(ave(time + iz * 1i, id, FUN = nvar)))
Here is a solution that is a (tiny) bit more complex than the one from G.Grothendieck. But is will be able to handle non-sequential times.
library( data.table )
#make d a data.table
setDT(d)
#you can remove the trailing [], they are just for passing the output to the console...
#nvar = 0 where iz = 1
d[ iz == 1, nvar := 0 ][]
#calculate nvar for iz == 0 BEFORE iz == 1, using a forward rolling join
#create subsets for redability
d1 <- d[ iz == 1, ]
d0 <- d[ iz == 0, ]
d[ iz == 0, nvar := time - d1[ d0, x.time, on = .(id, time), roll = -Inf ] ][]
#calculate nvar for iz == 0 AFTER iz == 1, usning a backward rolling join
#create subsets for redability
d1 <- d[ iz == 1, ]
d0 <- d[ iz == 0 & is.na( nvar ), ]
d[ iz == 0 & is.na(nvar) , nvar := time - d1[ d0, x.time, on = .(id, time), roll = Inf ] ][]
# id time iz nvar
# 1: a 1 0 -1
# 2: a 2 1 0
# 3: a 3 1 0
# 4: a 4 0 1
# 5: a 5 0 2
# 6: b 1 0 -4
# 7: b 2 0 -3
# 8: b 3 0 -2
# 9: b 4 0 -1
# 10: b 5 1 0
# 11: c 1 0 -3
# 12: c 2 0 -2
# 13: c 3 0 -1
# 14: c 4 1 0
# 15: c 5 1 0
One dplyr and purrr option could be:
d %>%
group_by(id) %>%
mutate(nvar = map_dbl(.x = seq_along(iz), ~ min(abs(.x - which(iz == 1)))),
nvar = if_else(cumsum(iz) == 0, -nvar, nvar))
id time iz nvar
<fct> <dbl> <dbl> <dbl>
1 a 1 0 -1
2 a 2 1 0
3 a 3 1 0
4 a 4 0 1
5 a 5 0 2
6 b 1 0 -4
7 b 2 0 -3
8 b 3 0 -2
9 b 4 0 -1
10 b 5 1 0
11 c 1 0 -3
12 c 2 0 -2
13 c 3 0 -1
14 c 4 1 0
15 c 5 1 0
Related
I´m trying to set up two new variables to incorporate into an existing data.frame which should be a running value starting at 1 (0) if a condition is met with respect to the IDs in the data.frame. So the data.frame is of similar structure to this:
ID Var1
1 0
1 2
1 5
1 12
2 0
2 2
2 NA
2 11
and I want to get to:
ID Var1 start stop
1 0 0 0
1 2 0 1
1 5 1 2
1 12 2 3
2 0 0 0
2 2 0 1
2 NA 1 2
2 11 2 3
Start should be a running value, starting once Var1 > 0 for the first time and stop should operate the same way. Start´s starting value should be 0 and stop´s starting value should be 1. It should further continue running, if Var1 takes on NA or 0 again in the course of the data.frame. I have tried doing the following:
df %>%
group_by(ID) %>%
mutate(stop = ifelse(Var1 > 0,
0:nrow(df), 0))
But the variable it returns doesn´t start with 0, but with the number of the row the condition is first met in.
Sorry, I don't speak dplyr but you can easily adapt this, since data.table is only used for group-by.
DF <- read.table(text = "ID Var1
1 0
1 2
1 5
1 12
2 0
2 2
2 NA
2 11", header = TRUE)
foo <- function(x) {
#quantify leading zeros:
x[is.na(x)] <- 0
lead0 <- cumsum(x > 0)
nlead0 <- sum(lead0 == 0)
#create result using sequence:
list(c(rep.int(0, nlead0), sequence(length(x) - nlead0) - 1),
c(rep.int(0, nlead0), sequence(length(x) - nlead0)))
}
library(data.table)
setDT(DF)
DF[, c("start", "stop") := foo(Var1), by = ID]
# ID Var1 start stop
#1: 1 0 0 0
#2: 1 2 0 1
#3: 1 5 1 2
#4: 1 12 2 3
#5: 2 0 0 0
#6: 2 2 0 1
#7: 2 NA 1 2
#8: 2 11 2 3
Here is base R option using ave + replace
transform(df,
Start = ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum) > 0, ID, FUN = function(x) cumsum(c(0, x))[-(length(x) + 1)]),
Stop = ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum) > 0, ID, FUN = cumsum)
)
or
transform(df,
Start = ave(ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum), ID, FUN = cumsum) > 1, ID, FUN = cumsum),
Stop = ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum) > 0, ID, FUN = cumsum)
)
which gives
ID Var1 Start Stop
1 1 0 0 0
2 1 2 0 1
3 1 5 1 2
4 1 12 2 3
5 2 0 0 0
6 2 2 0 1
7 2 NA 1 2
8 2 11 2 3
I have the following data frame in R:
Row number A B C D E F G H I J
1 1 1 0 0 1 0 0 1 1
2 1 0 0 0 1 0 0 1
3 1 0 0 0 1 0 0 1 1
I am trying to calculate the number of times the number changes between 1 and 0 excluding the Nulls
The result I am expecting is this
Row Number No of changes
---------- --------------
1 4
2 4
3 4
An explanation for row 1
In row 1, A has a null so we exclude that.
B and C have 1 which is our first set of values.
D and E have 0 which is our second set of values. Now Change = 1
F has our third set of values which is 1. Now Change = 1+1
G and H have 0 which is our third set of values. Now Change = 1+1+1
I and J have 1 which is our fourth set of values. Now Change = 1+1+1+1 =4
Here's a tidyverse approach.
I gather into longer format (from tidyr::pivot_longer), then add a helper column noting when we have a change from 0 to 1 or from 1 to 0, and then sum those by row.
library(tidyverse)
df %>%
# before tidyr 1.0, this would be gather(col, value, -1)
pivot_longer(-1, "col") %>%
group_by(Row.number) %>%
mutate(chg = value == 1 & lag(value) == 0 |
value == 0 & lag(value) == 1) %>%
summarize(no_chgs = sum(chg, na.rm = T))
# A tibble: 3 x 2
Row.number no_chgs
<int> <int>
1 1 4
2 2 4
3 3 4
Sample data:
df <- read.table(
header = T,
stringsAsFactors = F,
text = "'Row number' A B C D E F G H I J
1 NA 1 1 0 0 1 0 0 1 1
2 NA NA 1 0 0 0 1 0 0 1
3 NA 1 0 0 0 1 0 0 1 1")
Here's a data.table solution:
library(data.table)
dt <- as.data.table(df)
dt[,
no_change := max(rleid(na.omit(t(.SD)))) - 1,
by = RowNumber
]
dt
Alternatively, here's a base version:
apply(df[, -1],
1,
function(x) {
complete_case = complete.cases(x)
if (sum(complete_case) > 0) {
return(length(rle(x[complete_case])$lengths) - 1)
} else {
return (0)
}
}
)
Hi I would really appreciate some help for this, I really couldn't find the solution in previous questions.
I have a tibble in long format (rows grouped by id and arranged by time).
I want to create a variable "eleg" based on "varx". The condition would be that "eleg" = 1 if "varx" in the previous 3 rows == 0 and in the current row varx == 1, if not = 0, for each ID. If possible using dplyr.
id <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3)
time <- c(1,2,3,4,5,6,7,1,2,3,4,5,6,1,2,3,4)
varx <- c(0,0,0,0,1,1,0,0,1,1,1,1,1,0,0,0,1)
eleg <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1)
table <- data.frame(id, time, varx, eleg)
In my real dataset the condition is "in the previous 24 rows" and the same ID could have eleg == 1 more than one time if it suits the condition.
Thank you.
One of the approach could be
library(dplyr)
m <- 3 #number of times previous rows are looked back
df %>%
group_by(id) %>%
mutate(eleg = ifelse(rowSums(sapply(1:m, function(k) lag(varx, n = k, order_by = id, default = 1) == 0)) == m & varx == 1,
1,
0)) %>%
data.frame()
which gives
id time varx eleg
1 1 1 0 0
2 1 2 0 0
3 1 3 0 0
4 1 4 0 0
5 1 5 1 1
6 1 6 1 0
7 1 7 0 0
8 2 1 0 0
9 2 2 1 0
10 2 3 1 0
11 2 4 1 0
12 2 5 1 0
13 2 6 1 0
14 3 1 0 0
15 3 2 0 0
16 3 3 0 0
17 3 4 1 1
Sample data:
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3), time = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6,
1, 2, 3, 4), varx = c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
0, 0, 0, 1)), .Names = c("id", "time", "varx"), row.names = c(NA,
-17L), class = "data.frame")
library(data.table)
df %>%
mutate(elegnew = ifelse(Reduce("+", shift(df$varx, 1:3)) == 0 & df$varx == 1, 1, 0))
id time varx eleg elegnew
1 1 1 0 0 0
2 1 2 0 0 0
3 1 3 0 0 0
4 1 4 0 0 0
5 1 5 1 1 1
6 1 6 1 0 0
7 1 7 0 0 0
8 2 1 0 0 0
9 2 2 1 0 0
10 2 3 1 0 0
11 2 4 1 0 0
12 2 5 1 0 0
13 2 6 1 0 0
14 3 1 0 0 0
15 3 2 0 0 0
16 3 3 0 0 0
17 3 4 1 1 1
Here's another approach, using dplyr and zoo:
library(dplyr)
library(zoo)
df %>%
group_by(id) %>%
mutate(elegnew = as.integer(varx == 1 &
rollsum(varx == 1, k = 4, align = "right", fill = 0) == 1))
# # A tibble: 17 x 5
# # Groups: id [3]
# id time varx eleg elegnew
# <dbl> <dbl> <dbl> <dbl> <int>
# 1 1. 1. 0. 0. 0
# 2 1. 2. 0. 0. 0
# 3 1. 3. 0. 0. 0
# 4 1. 4. 0. 0. 0
# 5 1. 5. 1. 1. 1
# 6 1. 6. 1. 0. 0
# 7 1. 7. 0. 0. 0
# 8 2. 1. 0. 0. 0
# 9 2. 2. 1. 0. 0
# 10 2. 3. 1. 0. 0
# 11 2. 4. 1. 0. 0
# 12 2. 5. 1. 0. 0
# 13 2. 6. 1. 0. 0
# 14 3. 1. 0. 0. 0
# 15 3. 2. 0. 0. 0
# 16 3. 3. 0. 0. 0
# 17 3. 4. 1. 1. 1
The idea is to group by id and then check a) whether varx is 1 and b) whether the sum of varx=1 events in the previous 3 plus current row (k=4) is 1 (which means all previous 3 must be 0). I assume that varx is either 0 or 1.
You have asked for a dplyr solution, preferably.
The following is a base R one, with a function that you can adapt to "in the previous 24 rows", just pass n = 24 to the function.
fun <- function(DF, crit = "varx", new = "eleg", n = 3){
DF[[new]] <- 0
for(i in seq_len(nrow(DF))[-seq_len(n)]){
if(all(DF[[crit]][(i - n):(i - 1)] == 0) && DF[[crit]][i] == 1)
DF[[new]][i] <- 1
}
DF
}
sp <- split(table[-4], table[-4]$id)
new_df <- do.call(rbind, lapply(sp, fun))
row.names(new_df) <- NULL
identical(table, new_df)
#[1] TRUE
Note that if you are creating a new column, eleg, you would probably not need to split table[-4], just table since the 4th column wouldn't exist yet.
You could do do.call(rbind, lapply(sp, fun, n = 24)) and the rest would be the same.
I want to identify (not eliminate) duplicates in a data frame and add 0/1 variable accordingly (wether a row is a duplicate or not), using the R dplyr package.
Example:
| A B C D
1 | 1 0 1 1
2 | 1 0 1 1
3 | 0 1 1 1
4 | 0 1 1 1
5 | 1 1 1 1
Clearly, row 1 and 2 are duplicates, so I want to create a new variable (with mutate?), say E, that is equal to 1 in row 1,2,3 and 4 since row 3 and 4 are also identical.
Moreover, I want to add another variable, F, that is equal to 1 if there is a duplicate differing only by one column. That is, F in row 1,2 and 5 would be equal to 1 since they only differ in the B column.
I hope it is clear what I want to do and I hope that dplyr offers a smooth solution to this problem. This is of course possible in "base" R but I believe (hope) that there exists a smoother solution.
You can use dist() to compute the differences, and then a search in the resulting distance object can give the needed answers (E, F, etc.). Here is an example code, where X is the original data.frame:
W=as.matrix(dist(X, method="manhattan"))
X$E = as.integer(sapply(1:ncol(W), function(i,D){any(W[-i,i]==D)}, D=0))
X$F = as.integer(sapply(1:ncol(W), function(i,D){any(W[-i,i]==D)}, D=1))
Just change D= for the number of different columns needed.
It's all base R though. Using plyr::laply instead of sappy has same effect. dplyr looks overkill here.
Here is a data.table solution that is extendable to an arbitrary case (1..n columns the same)- not sure if someone can convert to dpylr for you. I had to change your dataset a bit to show your desired F column - in your example all rows would get a 1 because 3 and 4 are one column different from 5 as well.
library(data.table)
DT <- data.frame(A = c(1,1,0,0,1), B = c(0,0,1,1,1), C = c(1,1,1,1,1), D = c(1,1,1,1,1), E = c(1,1,0,0,0))
DT
A B C D E
1 1 0 1 1 1
2 1 0 1 1 1
3 0 1 1 1 0
4 0 1 1 1 0
5 1 1 1 1 0
setDT(DT)
DT_ncols <- length(DT)
base <- data.table(t(combn(1:nrow(DT), 2)))
setnames(base, c("V1","V2"),c("ind_x","ind_y"))
DT[, ind := .I)]
DT_melt <- melt(DT, id.var = "ind", variable.name = "column")
base <- merge(base, DT_melt, by.x = "ind_x", by.y = "ind", allow.cartesian = TRUE)
base <- merge(base, DT_melt, by.x = c("ind_y", "column"), by.y = c("ind", "column"))
base <- base[, .(common_cols = sum(value.x == value.y)), by = .(ind_x, ind_y)]
This gives us a data.frame that looks like this:
base
ind_x ind_y common_cols
1: 1 2 5
2: 1 3 2
3: 2 3 2
4: 1 4 2
5: 2 4 2
6: 3 4 5
7: 1 5 3
8: 2 5 3
9: 3 5 4
10: 4 5 4
This says that rows 1 and 2 have 5 common columns (duplicates). Rows 3 and 5 have 4 common columns, and 4 and 5 have 4 common columns. We can now use a fairly extendable format to flag any combination we want:
base <- melt(base, id.vars = "common_cols")
# Unique - common_cols == DT_ncols
DT[, F := ifelse(ind %in% unique(base[common_cols == DT_ncols, value]), 1, 0)]
# Same save 1 - common_cols == DT_ncols - 1
DT[, G := ifelse(ind %in% unique(base[common_cols == DT_ncols - 1, value]), 1, 0)]
# Same save 2 - common_cols == DT_ncols - 2
DT[, H := ifelse(ind %in% unique(base[common_cols == DT_ncols - 2, value]), 1, 0)]
This gives:
A B C D E ind F G H
1: 1 0 1 1 1 1 1 0 1
2: 1 0 1 1 1 2 1 0 1
3: 0 1 1 1 0 3 1 1 0
4: 0 1 1 1 0 4 1 1 0
5: 1 1 1 1 0 5 0 1 1
Instead of manually selecting, you can append all combinations like so:
# run after base <- melt(base, id.vars = "common_cols")
base <- unique(base[,.(ind = value, common_cols)])
base[, common_cols := factor(common_cols, 1:DT_ncols)]
merge(DT, dcast(base, ind ~ common_cols, fun.aggregate = length, drop = FALSE), by = "ind")
ind A B C D E 1 2 3 4 5
1: 1 1 0 1 1 1 0 1 1 0 1
2: 2 1 0 1 1 1 0 1 1 0 1
3: 3 0 1 1 1 0 0 1 0 1 1
4: 4 0 1 1 1 0 0 1 0 1 1
5: 5 1 1 1 1 0 0 0 1 1 0
Here is a dplyr solution:
test%>%mutate(flag = (A==lag(A)&
B==lag(B)&
C==lag(C)&
D==lag(D)))%>%
mutate(twice = lead(flag)==T)%>%
mutate(E = ifelse(flag == T | twice ==T,1,0))%>%
mutate(E = ifelse(is.na(E),0,1))%>%
mutate(FF = ifelse( ( (A +lag(A)) + (B +lag(B)) + (C+lag(C)) + (D + lag(D))) == 7,1,0))%>%
mutate(FF = ifelse(is.na(FF)| FF == 0,0,1))%>%
select(A,B,C,D,E,FF)
Result:
A B C D E FF
1 1 0 1 1 1 0
2 1 0 1 1 1 0
3 0 1 1 1 1 0
4 0 1 1 1 1 0
5 1 1 1 1 0 1
I have a loop that begins when a certain condition exits. It begins when the value is below -.1.0 The loop exits when the condition is over 1.0
#Loop for long entry and exit signal
entry <- -1.0 #Input value for entry
exit <- 1.o #Input value for exit
mydf$sig_long[[1]] = ifelse(mydf$roll.z.score.n3[[1]] < (entry), 1, 0)
for (i in 2:nrow(mydf)){
mydf$sig_long[i] = ifelse(mydf$roll.z.score.n3[i] < (entry), 1,
ifelse(mydf$roll.z.score.n3[i] > (exit), 0,
mydf$sig_long[i-1]))
}
I wanted to learn how to use the same loop but instead of exit on values > 1.0 Exit after a set number of lines. If for example I set nline variable to 5. It would loop and print 1, for a total of 5 lines after the initial entry. An example data frame below:
roll.z.score.n3
1 0
2 0
3 0.651651537
4 -1.153593891
5 -0.926552368
6 -0.369016769
7 0.65405595
8 -1.139305279
9 0.358231351
10 1.135314685
11 0.944997472
12 -0.293105191
13 -1.146659778
14 -0.66246734
15 -1.131901741
16 -0.600480649
17 -1.152333435
18 1.1025176
19 -0.144684006
20 -0.678000883
21 -1.146875039
22 -1.132235788
23 0.115583229
24 0.645489447
25 1.148754398
26 0.988193418
27 -0.818892395
After the script has run I would use zoo to +1 line the sig_long column.
new.df$sig_long <- Lag(new.df$sig_long,1)
Code Testing
Ok so i see the issue. As we are specifying the first part of the loop to enter and exit between -1.0 and 1.0 this prints 1's between those two criteria in the new.df$sig_long column. The issue arises if I set the nlines to 5. The entry/exit might exit the trade in 3 lines in the sig_long column. If that is the case, rollsum has no 1's to count in that column, even if i want to hold for 5 lines, if theres only 3 1's from entry... its not going to be able to compute the 5 line hold time. Perhaps we can use the first part of the loop like this in order to print a 1 at the entry condition:
new.df$sig_long <- ifelse(new.df$roll.z.score.n3 < -1.0 , 1, 0) #Set 1 at entry criteria
That sets the 1 at each < -1.0 value. Next would be how to count +5 lines from that point forward. Counting until...1,2,3,4,5, else 0... 0, until next +1.... count until 1,2,3,4,5, else 0, 0 until next +1....
I can see the logic... if newdf$sig_long == 1, count until nlines 5, else 0, repeat...
Then the 2nd part of the loop would work I think for the new_sig_long column
Made many changes. This should now work:
zz = '
roll.z.score.n3 sig_long
1 NA 0
2 NA 0
3 -1.135974424 0
4 0.193311168 1
5 0.714285714 1
6 -1.148753543 1
7 -0.942160394 1
8 -0.695763683 1
9 1.140646864 1
10 0.985196899 0
11 -0.768766574 0
12 -1.011293858 0
13 -0.516703612 1
14 -1.120897077 1
15 1.091089451 1
16 0.968364052 0
17 0.872871561 0
18 1.099524999 0
19 0.918397948 0
'
df <- read.table(text = zz, header = TRUE)
df = na.omit(df)
df$sig_long[[1]] = ifelse(df$roll.z.score.n3[[1]] < (-1), 1, 0)
for (i in 2:nrow(df)){
df$sig_long[i] = ifelse(df$roll.z.score.n3[i] < (-1), 1,
ifelse(df$roll.z.score.n3[i] > 1, 0,
df$sig_long[i-1]))
}
df$sig_long <- Lag(df$sig_long, 1)
colnames(df[, 2]) = "sig_long"
df = na.omit(df)
nlines = 5
df$rollsum = rollsumr(df$sig_long, k = nlines, fill = 0)
colnames(df[, 3]) = "rollsum"
df$new_sig_long[[1]] = ifelse(df$roll.z.score.n3[[1]] < (-1), 1, 0)
for (i in 2:nrow(df)){
df$new_sig_long[i] = ifelse(df$roll.z.score.n3[i] < (-1) & df$rollsum[i] < 5, 1,
ifelse(df$roll.z.score.n3[i] > 1 | df$rollsum[i] >= 5, 0,
df$sig_long[i-1]))
}
df
Output:
> df
roll.z.score.n3 sig_long rollsum new_sig_long
# 4 0.1933112 1 0 0
# 5 0.7142857 1 0 1
# 6 -1.1487535 1 0 1
# 7 -0.9421604 1 0 1
# 8 -0.6957637 1 5 0
# 9 1.1406469 1 5 0
# 10 0.9851969 0 4 1
# 11 -0.7687666 0 3 0
# 12 -1.0112939 0 2 1
# 13 -0.5167036 1 2 0
# 14 -1.1208971 1 2 1
# 15 1.0910895 1 3 0
# 16 0.9683641 0 3 1
# 17 0.8728716 0 3 0
# 18 1.0995250 0 2 0
# 19 0.9183979 0 1 0
Then you can take the lag of the new_sig_long if you wish.