How to count consecutive zero in last run?

How to count consecutive zero in last run? - r

I just want to count the numbers of consecutive zero in last run if last run is zero for atomic vector.
For example:
a <- c(1, 0, 0, 0)
So, the number of consecutive zero in last run is 3.
If last run is not zero, then answer must be zero. For example
a <- c(0, 1, 1, 0, 0, 1)
So, answer is zero because in the last run there is one, not zero.
I do not want to use any external package. I manage to write a function that use loop. But I think more efficient method must exist.
czero <- function(a) {
k = 0
for(i in 1:length(a)){
if(a[i] == 0) {
k = k + 1
} else k = 0
}
return(k)
}

Reverse a and then compute its cumulative sum. The leading 0's will be the only 0's left and ! of that will be TRUE for each and FALSE for other elements. The sum of that is the desired number.
sum(!cumsum(rev(a)))

The simplest improvement is to start your loop from the end of the vector and work backwards, instead of starting from the front. You can then save time by exiting the loop at the first non-zero element, instead of looping through the whole vector.
I've checked this against the given vectors, and a much longer vector with a small number of zeros at the end, to show a case where looping from the start takes a lot of time.
a <- c(1, 0, 0, 0)
b <- c(0, 1, 1, 0, 0, 1)
long <- rep(c(0, 1, 0, 1, 0), c(4, 6, 5, 10000, 3))
czero is the original function, f1 is the solution by akrun that uses rle, fczero starts the loop from the end, and revczero reverses the vector, then starts from the front.
czero <- function(a) {
k = 0
for(i in 1:length(a)){
if(a[i] == 0) {
k = k + 1
} else k = 0
}
return(k)
}
f1 <- function(vec){
pmax(0, with(rle(vec), lengths[values == 0 &
seq_along(values) == length(values)])[1], na.rm = TRUE)
}
fczero <- function(vec) {
k <- 0L
for (i in length(vec):1) {
if (vec[i] != 0) break
k <- k + 1L
}
return(k)
}
revczero <- function(vec) {
revd <- rev(vec)
k <- 0L
for (i in 1:length(vec)) {
if (revd[i] != 0) break
k <- k + 1L
}
return(k)
}
Time benchmarks are below. EDIT: I've also added Grothendieck's version.
microbenchmark::microbenchmark(czero(a), f1(a), fczero(a), revczero(a), sum(!cumsum(rev(a))), times = 1000)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# czero(a) 0 514 621.035 514 515 21076 1000
# f1(a) 21590 23133 34455.218 27245 30843 3211826 1000
# fczero(a) 0 514 688.892 514 515 28274 1000
# revczero(a) 2570 3085 4626.047 3599 4626 112064 1000
# sum(!cumsum(rev(a))) 2056 2571 3879.630 3085 3599 62201 1000
microbenchmark::microbenchmark(czero(b), f1(b), fczero(b), revczero(b), sum(!cumsum(rev(b))), times = 1000)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# czero(b) 0 514 809.691 514 515 29815 1000
# f1(b) 22104 23647 29372.227 24675 26217 1319583 1000
# fczero(b) 0 0 400.502 0 514 26217 1000
# revczero(b) 2056 2571 3844.176 3085 3599 99727 1000
# sum(!cumsum(rev(b))) 2056 2570 3592.281 3084 3598.5 107952 1000
microbenchmark::microbenchmark(czero(long), f1(long), fczero(long), revczero(long), sum(!cumsum(rev(long))), times = 1000)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# czero(long) 353156 354699 422077.536 383486 443631.0 1106250 1000
# f1(long) 112579 119775 168408.616 132627 165269.5 2068050 1000
# fczero(long) 0 514 855.444 514 1028.0 43695 1000
# revczero(long) 24161 27245 35890.991 29301 36498.0 149591 1000
# sum(!cumsum(rev(long))) 49350 53462 71035.486 56546 71454 2006363 1000

We can use rle
f1 <- function(vec){
pmax(0, with(rle(vec), lengths[values == 0 &
seq_along(values) == length(values)])[1], na.rm = TRUE)
}
f1(a)
#[1] 3
In the second case,
b <- c(0, 1, 1, 0, 0, 1)
f1(b)
#[1] 0
Or another option is to create a function with which and cumsum
f2 <- function(vec) {
i1 <- which(!vec)
if(i1[length(i1)] != length(vec)) 0 else {
sum(!cumsum(rev(c(TRUE, diff(i1) != 1)))) + 1
}
}
f2(a)
f2(b)

with data.table:
ifelse(last(a) == 0,
sum(rleid(a) == last(rleid(a))),
0)
As
> rleid(a)
[1] 1 2 2 2
It is the length of the last group, if the last value is 0

Related

For a dataset of 0's and 1's, set all but the first 1 in each row to 0's

I have a data.frame of 1,480 rows and 1,400 columns like:
1 2 3 4 5 6 ..... 1399 1400
1 0 0 0 1 0 0 ..... 1 0 #first occurrence would be at 4
2 0 0 0 0 0 1 ..... 0 1
3 1 0 0 1 0 0 ..... 0 0
## and etc
Each row contains a series of 0's and 1's - predominantly 0's. For each row, I want to find at which column the first 1 shows up and set the remaining values to 0's.
My current implementation can efficiently find the occurrence of the first 1, but I've only figured out how to zero out the remaining values iteratively by row. In repeated simulations, this iterative process is taking too long.
Here is the current implementation:
N <- length(df[which(df$arm == 0), "pt_id"]) # of patients
M <- max_days
#
# df is like the data frame shown above
#
df[which(df$arm == 0), 5:length(colnames(df))] <- unlist(lapply(matrix(data = rep(pbo_hr, M*N), nrow=N, ncol = M), rbinom, n=1, size = 1))
event_day_post_rand <- apply(df[,5:length(colnames(df))], MARGIN = 1, FUN = function(x) which (x>0)[1])
df <- add_column(df, "event_day_post_rand" = event_day_post_rand, .after = "arm_id")
##
## From here trial days start on column 6 for df
##
#zero out events that occurred after the first event, since each patient can only have 1 max event which will be taken as the earliest event
for (pt_id in df[which(!is.na(df$event_day_post_rand)),"pt_id"]){
event_idx = df[which(df$pt_id == pt_id), "event_day_post_rand"]
df[which(df$pt_id == pt_id), as.character(5+event_idx+1):"1400"] <- 0
}

We can do
mat <- as.matrix(df) ## data frame to matrix
j <- max.col(mat, ties.method = "first")
mat[] <- 0
mat[cbind(1:nrow(mat), j)] <- 1
df <- data.frame(mat) ## matrix to data frame
I also suggest just using a matrix to store these values. In addition, the result will be a sparse matrix. So I recommend
library(Matrix)
sparseMatrix(i = 1:nrow(mat), j = j, x = rep(1, length(j)))

We can get a little more performance by setting the 1 elements to 0 whose rows are duplicates.
Since the OP is open to starting with a matrix rather than a data.frame, I'll do the same.
# dummy data
m <- matrix(sample(0:1, 1480L*1400L, TRUE, c(0.9, 0.1)), 1480L, 1400L)
# proposed solution
f1 <- function(m) {
ones <- which(m == 1L)
m[ones[duplicated((ones - 1L) %% nrow(m), nmax = nrow(m))]] <- 0L
m
}
# Zheyuan Li's solution
f2 <- function(m) {
j <- max.col(m, ties.method = "first")
m[] <- 0L
m[cbind(1:nrow(m), j)] <- 1L
m
}
microbenchmark::microbenchmark(f1 = f1(m),
f2 = f2(m),
check = "identical")
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1 9.1457 11.45020 12.04258 11.9011 12.3529 37.6716 100
#> f2 12.8424 14.92955 17.31811 15.3251 16.0550 43.6314 100
Zheyuan Li's suggestion to go with a sparse matrix is a good idea.
# convert to a memory-efficient nsparseMatrix
library(Matrix)
m1 <- as(Matrix(f1(m), dimnames = list(NULL, NULL), sparse = TRUE), "nsparseMatrix")
object.size(m)
#> 8288216 bytes
object.size(m1)
#> 12864 bytes
# proposed function to go directly to a sparse matrix
f3 <- function(m) {
n <- nrow(m)
ones <- which(m == 1L) - 1L
i <- ones %% n
idx <- which(!duplicated(i, nmax = n))
sparseMatrix(i[idx], ones[idx] %/% n, dims = dim(m), index1 = FALSE, repr = "C")
}
# going directly to a sparse matrix using Zheyuan Li's solution
f4 <- function(m) {
sparseMatrix(1:nrow(m), max.col(m, ties.method = "first"), dims = dim(m), repr = "C")
}
identical(m1, f3(m))
#> [1] TRUE
identical(m1, f4(m))
#> [1] TRUE
microbenchmark::microbenchmark(f1 = f1(m),
f3 = f3(m),
f4 = f4(m))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1 9.1719 9.30715 11.12569 9.52300 11.92740 83.8518 100
#> f3 7.4330 7.59875 12.62412 7.69610 11.08815 84.8291 100
#> f4 8.9607 9.31115 14.01477 9.49415 11.44825 87.1577 100

In R fastest way to find the first element of each row that satisfies a condition

Here is a matrix where each row is the realization of an MA(1) process.
V <- matrix(rnorm(1000*10, mean = -1, sd = 1),
nrow = 1000, ncol = 10)
H <- matrix(nrow = 1000, ncol = 10)
H[,1] <- -.5
for (t in seq_len(9)) {
H[,t+1] <- V[,t+1] + .9 * V[,t]
}
I would like to find the first element of each row that is larger than 0. If no elements in the row are larger than 0, then I would like to return Inf.
Here are three different ways I have tried.
find.min <- function (x) {
e <- which(x >= 0)
if (length(e) > 0) {
return(min(e))
} else {
return(Inf)
}
}
find.min2 <- function (x) {
e <- Position(function(a) (a >= 0), x)
if (is.na(e)) {
return(Inf)
} else {
return(e)
}
}
library("purrr")
find.min3 <- function (x) {
e <- detect_index(x, function(a) (a >= 0))
if (e == 0) {
return(Inf)
} else {
return(e)
}
}
And the results of benchmarking:
library("microbenchmark")
> microbenchmark(apply(H, 1, find.min),
+ apply(H, 1, find.min2),
+ apply(H, 1, find.min3))
Unit: milliseconds
expr min lq mean median uq
apply(H, 1, find.min) 1.361127 1.490270 1.732356 1.588842 1.668582
apply(H, 1, find.min2) 3.587456 3.791422 4.460397 4.172458 4.265154
apply(H, 1, find.min3) 29.167030 31.414109 34.781083 33.170437 35.959100
max neval
6.824049 100
10.003467 100
135.689261 100
Based on this, I have three questions:
Why is find.min2 slower than find.min? I had thought that Position would be faster since it quits after finding the first TRUE?
Is there a faster way that find.min?
find.min was my first try, and I started benchmarking because profiling revealed that this was the bottleneck in a Monte Carlo simulation that essentially computes the mean of a column of H across all rows where the first positive in H was column 4 (say). I am a bit new to R, but I was surprised that just determining the set of rows to average over could be the bottleneck. Am I doing something wrong?
UPDATE
Based on a comment, here's updated benchmark:
library("microbenchmark")
> microbenchmark(apply(H, 1, find.min),
+ apply(H, 1, find.min2),
+ apply(H, 1, find.min3),
+ replace(y <- max.col(H > 0, ties.method = "first"), y == 1, Inf),
+ unit = "ms")
Unit: milliseconds
expr min lq
apply(H, 1, find.min) 1.321301 1.4468300
apply(H, 1, find.min2) 3.669489 3.7948605
apply(H, 1, find.min3) 29.390980 31.4460145
replace(y <- max.col(H > 0, ties.method = "first"), y == 1, Inf) 0.053014 0.0854115
mean median uq max neval
1.71449367 1.5661085 1.630354 7.575459 100
4.40715776 4.1524930 4.322261 10.553739 100
34.21487510 33.5333335 37.028033 44.427120 100
0.09416856 0.0996835 0.107846 0.213543 100
So the suggested approach looks much faster. But why? Is apply slow?

Fill all entries between two specified values

I have a long vector, thousands of entries, which has elements 0, 1, 2 in it sporadically. 0 means "no signal", 1 means "signal on", and 2 means "signal off". I am trying to find the runs from 1 to the next occurrence of 2 and fill the space with 1s. I also need to do the same thing between a 2 and the next occurrence of 1 but fill the space with 0s.
I currently have a solution for this issue using loops but it's slow and incredibly inefficient:
example vector:
exp = c(1,1,1,0,0,1,2,0,2,0,1,0,2)
desired result:
1,1,1,1,1,1,2,0,0,0,1,1,2
Thank you

You could use rle & shift from the data.table-package in the following way:
library(data.table)
# create the run-length object
rl <- rle(x)
# create indexes of the spots in the run-length object that need to be replaced
idx1 <- rl$values == 0 & shift(rl$values, fill = 0) == 1 & shift(rl$values, fill = 0, type = 'lead') %in% 1:2
idx0 <- rl$values == 2 & shift(rl$values, fill = 0) == 0 & shift(rl$values, fill = 2, type = 'lead') %in% 0:1
# replace these values
rl$values[idx1] <- 1
rl$values[idx0] <- 0
Now you will get the desired result by using inverse.rle:
> inverse.rle(rl)
[1] 1 1 1 1 1 1 2 0 0 0 1 1 2
As an alternative for the shift-function, you could also use the lag and lead functions from dplyr.
If you want to assess the speed of both approaches, the microbenchmark-package is a useful tool. Below you'll find 3 benchmarks, each for a different vector size:
# create functions for both approaches
jaap <- function(x) {
rl <- rle(x)
idx1 <- rl$values == 0 & shift(rl$values, fill = 0) == 1 & shift(rl$values, fill = 0, type = 'lead') %in% 1:2
idx0 <- rl$values == 2 & shift(rl$values, fill = 0) == 0 & shift(rl$values, fill = 2, type = 'lead') %in% 0:1
rl$values[idx1] <- 1
rl$values[idx0] <- 0
inverse.rle(rl)
}
john <- function(x) {
Reduce(f, x, 0, accumulate = TRUE)[-1]
}
Execute the benchmarks:
# benchmark on the original data
> microbenchmark(jaap(x), john(x), times = 100)
Unit: microseconds
expr min lq mean median uq max neval cld
jaap(x) 58.766 61.2355 67.99861 63.8755 72.147 143.841 100 b
john(x) 13.684 14.3175 18.71585 15.7580 23.902 50.705 100 a
# benchmark on a somewhat larger vector
> x2 <- rep(x, 10)
> microbenchmark(jaap(x2), john(x2), times = 100)
Unit: microseconds
expr min lq mean median uq max neval cld
jaap(x2) 69.778 72.802 84.46945 76.9675 87.3015 184.666 100 a
john(x2) 116.858 121.058 127.64275 126.1615 130.4515 223.303 100 b
# benchmark on a very larger vector
> x3 <- rep(x, 1e6)
> microbenchmark(jaap(x3), john(x3), times = 20)
Unit: seconds
expr min lq mean median uq max neval cld
jaap(x3) 1.30326 1.337878 1.389187 1.391279 1.425186 1.556887 20 a
john(x3) 10.51349 10.616632 10.689535 10.670808 10.761191 10.918953 20 b
From this you can conclude that the rle-approach has an advantage when applied to vectors that are larger than 100 elements (which is probably nearly always).

You could also use Reduce with the following function:
f <- function(x,y){
if(x == 1){
if(y == 2) 2 else 1
}else{
if(y == 1) 1 else 0
}
}
Then:
> x <- c(1,1,1,0,0,1,2,0,2,0,1,0,2)
> Reduce(f, x, 0, accumulate = TRUE)[-1]
[1] 1 1 1 1 1 1 2 0 0 0 1 1 2

Find elements in vector in R

A matrix I have has exactly 2 rows and n columns example
c(0,0,0,0,1,0,2,0,1,0,1,1,1,0,2)->a1
c(0,2,0,0,0,0,2,1,1,0,0,0,0,2,0)->a2
rbind(a1,a2)->matr
for a specific column ( in this example 9 with 1 in both rows) I do need to find to the left and to the right the first instance of 2/0 or 0/2 - in this example to the left is 2 and the other is 14)
The elements of every row can either be 0,1,2 - nothing else . Is there a way to do that operation on large matrixes (with 2 rows) fast? I need to to it 600k times so speed might be a consideration

library(compiler)
myfun <- cmpfun(function(m, cl) {
li <- ri <- cl
nc <- ncol(m)
repeat {
li <- li - 1
if(li == 0 || ((m[1, li] != 1) && (m[1, li] + m[2, li] == 2))) {
l <- li
break
}
}
repeat {
ri <- ri + 1
if(ri == nc || ((m[1, ri] != 1) && (m[1, ri] + m[2, ri] == 2))) {
r <- ri
break
}
}
c(l, r)
})
and, after taking into account #Martin Morgan's observations,
set.seed(1)
N <- 1000000
test <- rbind(sample(0:2, N, replace = TRUE),
sample(0:2, N, replace = TRUE))
library(microbenchmark)
microbenchmark(myfun(test, N / 2), fun(test, N / 2), foo(test, N / 2),
AWebb(test, N / 2), RHertel(test, N / 2))
# Unit: microseconds
expr min lq mean median uq max neval cld
# myfun(test, N/2) 4.658 20.033 2.237153e+01 22.536 26.022 85.567 100 a
# fun(test, N/2) 36685.750 47842.185 9.762663e+04 65571.546 120321.921 365958.316 100 b
# foo(test, N/2) 2622845.039 3009735.216 3.244457e+06 3185893.218 3369894.754 5170015.109 100 d
# AWebb(test, N/2) 121504.084 142926.590 1.990204e+05 193864.670 209918.770 489765.471 100 c
# RHertel(test, N/2) 65998.733 76805.465 1.187384e+05 86089.980 144793.416 385880.056 100 b
set.seed(123)
test <- rbind(sample(0:2, N, replace = TRUE, prob = c(5, 90, 5)),
sample(0:2, N, replace = TRUE, prob = c(5, 90, 5)))
microbenchmark(myfun(test, N / 2), fun(test, N / 2), foo(test, N / 2),
AWebb(test, N / 2), RHertel(test, N / 2))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# myfun(test, N/2) 81.805 103.732 121.9619 106.459 122.36 307.736 100 a
# fun(test, N/2) 26362.845 34553.968 83582.9801 42325.755 106303.84 403212.369 100 b
# foo(test, N/2) 2598806.742 2952221.561 3244907.3385 3188498.072 3505774.31 4382981.304 100 d
# AWebb(test, N/2) 109446.866 125243.095 199204.1013 176207.024 242577.02 653299.857 100 c
# RHertel(test, N/2) 56045.309 67566.762 125066.9207 79042.886 143996.71 632227.710 100 b

I was slower than #Laterow, but anyhow, this is a similar approach
foo <- function(mtr, targetcol) {
matr1 <- colSums(mtr)
matr2 <- apply(mtr, 2, function(x) x[1]*x[2])
cols <- which(matr1 == 2 & matr2 == 0) - targetcol
left <- cols[cols < 0]
right <- cols[cols > 0]
c(ifelse(length(left) == 0, NA, targetcol + max(left)),
ifelse(length(right) == 0, NA, targetcol + min(right)))
}
foo(matr,9) #2 14

Combine the information by squaring the rows and adding them. The right result should be 4. Then, simply find the first column that is smaller than 9 (rev(which())[1]) and the first column that is larger than 9 (which()[1]).
fun <- function(matr, col){
valid <- which((matr[1,]^2 + matr[2,]^2) == 4)
if (length(valid) == 0) return(c(NA,NA))
left <- valid[rev(which(valid < col))[1]]
right <- valid[which(valid > col)[1]]
c(left,right)
}
fun(matr,9)
# [1] 2 14
fun(matr,1)
# [1] NA 2
fun(matrix(0,nrow=2,ncol=100),9)
# [1] NA NA
Benchmark
set.seed(1)
test <- rbind(sample(0:2,1000000,replace=T),
sample(0:2,1000000,replace=T))
microbenchmark::microbenchmark(fun(test,9))
# Unit: milliseconds
# expr min lq mean median uq max neval
# fun(test, 9) 22.7297 27.21038 30.91314 27.55106 28.08437 51.92393 100
Edit: Thanks to #MatthewLundberg for pointing out a lot of mistakes.

If you are doing this many times, precompute all the locations
loc <- which((a1==2 & a2==0) | (a1==0 & a2==2))
You can then find the first to the left and right with findInterval
i<-findInterval(9,loc);loc[c(i,i+1)]
# [1] 2 14
Note that findInterval is vectorized should you care to specify multiple target columns.

That is an interesting question. Here's how I would address it.
First a vector is defined which contains the product of each column:
a3 <- matr[1,]*matr[2,]
Then we can find the columns with pairs of (0/2) or (2/0) rather easily, since we know that the matrix can only contain the values 0, 1, and 2:
the02s <- which(colSums(matr)==2 & a3==0)
Next we want to find the pairs of (0/2) or (2/0) that are closest to a given column number, on the left and on the right of that column. The column number could be 9, for instance:
thecol <- 9
Now we have basically all we need to find the index (the column number in the matrix) of a combination of (0/2) or (2/0) that is closest to the column thecol. We just need to use the output of findInterval():
pos <- findInterval(thecol,the02s)
pos <- c(pos, pos+1)
pos[pos==0] <- NA # output NA if no column was found on the left
And the result is:
the02s[pos]
# 2 14
So the indices of the closest columns on either side of thecol fulfilling the required condition would be 2 and 14 in this case, and we can confirm that these column numbers both contain one of the relevant combinations:
matr[,14]
#a1 a2
# 0 2
matr[,2]
#a1 a2
# 0 2
Edit: I changed the answer such that NA is returned in the case where no column exists on the left and/or on the right of thecol in the matrix that fulfills the required condition.

R: recode previous/following n observations

I have a dataframe of 0/1 dummy variables. Each dummy variable only takes the value 1 once. For each column, I would want to replace n preceding/following observations counting from the observation with the value 1 to a particular value (say 1).
So for single vector, with n=1:
c(0, 0, 1, 0, 0)
I would want to get
c(0, 1, 1, 1, 0)
What would be a good general approach with n columns and allowing for a different number of preceding/following observations to replace (e.g n-1 before & n after)?
Thanks for help!

x<-c(0,0,1,0,0)
ind<-which(x==1)
x[(ind-1):(ind+x)]<-1

Another option:
f <- function(x, pre, post) {
idx <- which.max(x)
x[max(1, (idx-pre)):min(length(x), (idx+post))] <- 1
x
}
Sample data:
df <- data.frame(x = c(0, 0, 1, 0, 0), y = c(0, 1, 0, 0, 0))
Application:
df[] <- lapply(df, f, pre=2, post=1)
#df
# x y
#1 1 1
#2 1 1
#3 1 1
#4 1 0
#5 0 0

What you can do is the following:
vec <- c(0, 0, 1, 0, 0)
sapply(1:length(vec), function(i) {
minval <- max(0, i - 1)
maxval <- min(i + 1, length(vec))
return(sum(vec[minval:maxval]))
})
# [1] 0 1 1 1 0
Or to put it in a function (same code but a bit more compact)
f <- function(vec){
sapply(1:length(vec), function(i)
sum(vec[max(0, i-1):min(i+1, length(vec))]))
}
f(vec)
# [1] 0 1 1 1 0
Speedtest
To compare the two different solutions, I quickly ran a benchmark using microbenchmark, and the winner is: Clearly #Shenglin's code.... Always nice to see simple solutions (as well as to see how complicated some (my) solutions can be).
fDavid <- function(vec){
sapply(1:length(vec), function(i)
sum(vec[max(0, i-1):min(i+1, length(vec))]))
}
fHeroka <- function(vec){
res <- vec
test <- which(vec==1)
#create indices to be replaced
n=1 #variable n
replace_indices <- c(test+(1:n),test-(1:n))
#filter out negatives (may happen with larger n)
replace_indices <- replace_indices[replace_indices>0]
#replace items in 'res' that need to be replaced with 1
res[replace_indices] <- 1
}
fShenglin <- function(vec){
ind<-which(vec==1)
vec[(ind-1):(ind+x)]<-1
}
vect <- sample(0:1, size = 1000, replace = T)
library(microbenchmark)
microbenchmark(fHeroka(vect), fDavid(vect), fShenglin)
# # Unit: nanoseconds
# expr min lq mean median uq max
# fHeroka(vect) 38929 42999 54422.57 49546 61755.5 145451
# fDavid(vect) 2463805 2577935 2875024.99 2696844 2849548.5 5994596
# fShenglin 0 0 138.63 1 355.0 1063
# neval cld
# 100 a
# 100 b
# 100 a
# Warning message:
# In microbenchmark(fHeroka(vect), fDavid(vect), fShenglin) :
# Could not measure a positive execution time for 30 evaluations.

This might be a start:
myv <- c(0, 0, 1, 0, 0)
#make a copy
res <- myv
#check where the ones are
test <- which(myv==1)
#create indices to be replaced
n=1 #variable n
replace_indices <- c(test+(1:n),test-(1:n))
#filter out negatives (may happen with larger n)
replace_indices <- replace_indices[replace_indices>0]
#replace items in 'res' that need to be replaced with 1
res[replace_indices] <- 1
res
> res
[1] 0 1 1 1 0

This could be a solution:
dat<-data.frame(x=c(0,0,1,0,0,0),y=c(0,0,0,1,0,0),z=c(0,1,0,0,0,0))
which_to_change<-data.frame(prev=c(2,2,1),foll=c(1,1,3))
for(i in 1:nrow(which_to_change)){
dat[(which(dat[,i]==1)-which_to_change[i,1]):(which(dat[,i]==1)+which_to_change[i,2]),i]<-1
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to count consecutive zero in last run? - r

Reverse a and then compute its cumulative sum. The leading 0's will be the only 0's left and ! of that will be TRUE for each and FALSE for other elements. The sum of that is the desired number. sum(!cumsum(rev(a)))

with data.table: ifelse(last(a) == 0, sum(rleid(a) == last(rleid(a))), 0) As > rleid(a) [1] 1 2 2 2 It is the length of the last group, if the last value is 0

Related

For a dataset of 0's and 1's, set all but the first 1 in each row to 0's

In R fastest way to find the first element of each row that satisfies a condition

Fill all entries between two specified values

Find elements in vector in R

R: recode previous/following n observations

Categories

Resources