Related
I'm writing a function that needs to call a function g passed as a parameter to each element of a list, iteratively.
I'm wondering how to make this the fastest possible. I can achieve an acceptable speed using Rcpp and specific kind of g (writing everything in Cpp), but I can't figure out if I can reach similar speed passing an R function as argument.
Was doing some tests to figure out why R is slower and found some really unexpected results:
minus <- function(x) -x
minus_vec <- Vectorize(minus, "x")
Testing with some simple functions to invert signs.
f0 <- function(x) {
sapply(x, minus)
}
f1 <- function(x) {
for(i in seq_along(x)){
x[i] <- -x[i]
}
x
}
f2 <- function(x) {
for(i in seq_along(x)){
x[i] <- minus(x[i])
}
x
}
I got the following results:
a <- 1:10^5
library(rbenchmark)
benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))[,c(1,4)]
test relative
1 f0(a) 454.842
2 f1(a) 25.579
3 f2(a) 178.211
4 minus_vec(a) 523.789
5 minus(a) 1.000
I would like some explanation on the following points:
Why don't f1 and f2 have the same speed? Writing the piece of code -x[i] and calling the function minus(x[i]) really should be so different when they do the exact same thing?
Why is f0 slower than f2? I always thought apply functions were more efficient than for loops, but never really understood why and now I even found a counter-example.
Can I make a function as fast as f1 using the function minus ?
Why does vectorizing minus (unnecessary since - is already vectorized, but that might not be the case always) made it so bad?
Not a full answer, but here are a few notes
1 minus(x) vs -x: Doing nothing is better than doing something
Your function minus calls `-`, so the added step adds computation time. I honestly do not know the who's, what's and when's specifically, in other words I wouldn't know how much more computation time ought to be expected.
Here is an example highlighting it: we have four functions, all squaring numbers
fa <- function (n) n^2
fb <- function (n) fa(n)
fc <- function (n) fb(n)
fd <- function (n) fc(n)
Fa <- function (n) {
for (i in seq_along(n)) n[i] <- fa(i)
n
}
Fb <- function (n) {
for (i in seq_along(n)) n[i] <- fb(i)
n
}
Fc <- function (n) {
for (i in seq_along(n)) n[i] <- fc(i)
n
}
Fd <- function (n) {
for (i in seq_along(n)) n[i] <- fd(i)
n
}
And here are the benchmarking results
n <- 1:10^4
b <- benchmark(Fa(n),Fb(n),Fc(n),Fd(n), replications = 1000L)
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 Fa(n) 1000 3.93 1.000 3.85 0.00 NA NA
# 2 Fb(n) 1000 7.08 1.802 6.94 0.02 NA NA
# 3 Fc(n) 1000 10.16 2.585 9.94 0.06 NA NA
# 4 Fd(n) 1000 13.68 3.481 13.56 0.00 NA NA
# looks rather even
diff(b$elapsed)
# [1] 3.15 3.08 3.52
Now back to your minusfunction
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))
b$elapsed[b$test == 'f2(a)'] - b$elapsed[b$test == 'f1(a)']
# [1] 3.39
2 apply vs for vs Vectorize:
#NavyCheng provided for some good material on the topic. Now my understanding is, the apply family (just like Vectorize) loops in R (whereas if I'm not mistaking the looping for `-` is done in C).
Again, I do not know about the exact details, but if apply/Vectorize use R loops, then, in theory (and often in practice), it is possible to write a proper for loop that will perform as good or better.
3 A Function as fast as f1:
Ad-hoc, the closes I came up was by cheating using the Rcpp package. (cheating since one writes the function in c++ first)
In C++
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector minusCpp(NumericVector x) {
for (int k = 0; k < x.length(); ++k) {
x[k] = -x[k];
}
return x;
}
Now to the bechmarks in R
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a), minusCpp(a))
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 f0(a) 100 9.47 NA 9.22 0.01 NA NA
# 2 f1(a) 100 0.53 NA 0.54 0.00 NA NA
# 3 f2(a) 100 4.23 NA 4.24 0.00 NA NA
# 5 minus(a) 100 0.00 NA 0.00 0.00 NA NA
# 4 minus_vec(a) 100 10.42 NA 10.39 0.02 NA NA
# 6 minusCpp(a) 100 0.05 NA 0.04 0.00 NA NA
Ignore -x[i] and minus(-x[i]), and I summarize the four questions to two:
Why apply family is slower than forloop?
Why Vectorize is slower than apply family?
For the 1st question:
The apply functions are designed to be convenient and clear to read,
not necessarily fast.
and apply family will do more things than forloop,
Also the sapply function first uses as.vector(unlist(...)) to convert anything to a vector, and in the end tries to simplify the answer into a suitable form.
You can't read here and here for more detail.
For for 2rd question, it's because Vectorize is a wrapper of mapply and if you type Vectorize in Rstudio, you'll see the detail code. you can read this for more help.
This question already has answers here:
Subtract every element of vector A from every element of vector B
(4 answers)
Closed 5 years ago.
I need to take the difference between any two elements of two vector.
If A<-c(1,2) and B<-c(3,4) then my result R should be c(3-1,3-2,4-1,4-2).
With this snippet
myfunction <- function(N)
{
A = runif(N)
B = runif(N)
R = c()
for(a in A){
for(b in B){
R=c(b-a,R)
}
}
R
}
print(system.time(result <- myfunction(300)))
I get this time
user system elapsed
14.27 0.01 14.39
Is there any faster way to do it?
The fastest base solution is the use of outer:
as.vector(outer(B,A,"-"))
To my own surprise, map2_dbl is actually quite a bit faster than outer:
Not to my surprise, map2_dbl seems faster, but that's because it is not calculating every combination of values in A and B:
test elapsed relative
3 CP(A, B) 7.54 47.125 # using expand.grid
2 JL(A, B) 0.16 1.000 # using map2_dbl
1 JM(A, B) 3.13 19.563 # using outer
But:
> A <- 1:3
> B <- 3:1
> JL(A,B)
[1] -2 0 2
> JM(A,B)
[1] 2 1 0 1 0 -1 0 -1 -2
This is for two vectors of length 1000, and with 100 replications. I didn't include your own solution because that one is ridiculously slow for two reasons:
for loops in R are quite a bit faster than in the old days, but still not as optimal as using functions that have their loops coded in C or equivalent. That's the case for the functions used in the tested code here.
you "grow" your result object. Every loop through the code, that R becomes one value larger, so R has to look for a new place in the memory to store it. That's actually the biggest bottleneck in your code. Try to avoid that kind of construct at all costs, because it's one of the most important causes of terribly slow code.
The benchmark code:
library(tidyverse)
JM <- function(A,B){
as.vector(outer(B,A,"-"))
}
JL <- function(A,B){
map2_dbl(.x = A,
.y = B,
.f = ~ c(.x - .y))
}
CP <- function(A,B){
as.data.frame(expand.grid(A,B)) %>%
mutate(Var3 = Var2-Var1)
}
library(rbenchmark)
A <- runif(1000)
B <- runif(1000)
benchmark(JM(A,B),
JL(A,B),
CP(A,B),
replications = 100,
columns = c("test","elapsed","relative"))
You can use expand.grid to vectorize the approach:
A <- runif(300)
B <- runif(300)
library(dplyr)
R <- as.data.frame(expand.grid(A,B)) %>%
mutate(Var3 = Var2-Var1)
The first 5 lines of output:
Var1 Var2 Var3
1 0.8516676 0.325261 -0.5264066246
2 0.2126453 0.325261 0.1126156694
3 0.5394620 0.325261 -0.2142010126
4 0.1364876 0.325261 0.1887734290
5 0.3248651 0.325261 0.0003958747
This took:
user system elapsed
0.02 0.00 0.02
Your function took:
user system elapsed
42.39 0.43 42.90
Using purrr::map2:
library(tidyverse)
N = 300
A = runif(N)
B = runif(N)
R = c()
print(
system.time(
result <- map(
.x = A,
.f = ~ c(.x - B)) %>% unlist
)
)
Time taken:
user system elapsed
0.02 0 0.02
If I got your attention now, check out this repo for a nice walk through of purrr.
I have a big time series full in one dataframe and a list of timestamps in a different dataframe test. I need to subset full with data points surrounding the timestamps in test. My first instinct (as an R noob) was to write the below, which was wrong
subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))
Looking at the result I realized that R is looping through both the vectors simultaneously giving the wrong result. My option is to write a loop like the below:
subs<-data.frame()
for (j in test$dt)
subs <- rbind(subs,subset(full,full$dt>(j-i) & full$dt<(j+i)))
I feel that there might be a better way to do loops and this article implores us to avoid R loops as much as possible. The other reason is I might be hitting up against performance issues as this would be at the heart of an optimization algorithm. Any suggestions from gurus would be greatly appreciated.
EDIT:
Here is some reproducible code that shows the wrong approach as well as the approach that works but could be better.
#create a times series
full <- data.frame(seq(1:200),rnorm(200,0,1))
colnames(full)<-c("dt","val")
#my smaller array of points of interest
test <- data.frame(seq(5,200,by=23))
colnames(test)<-c("dt")
# my range around the points of interset
i<-3
#the wrong approach
subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))
#this works, but not sure this is the best way to go about it
subs<-data.frame()
for (j in test$dt)
subs <- rbind(subs,subset(full,full$dt>(j-i) & full$dt<(j+i)))
EDIT:
I updated the values to better reflect my usecase, and I see #mrdwab 's solution pulling ahead unexpectedly and by a wide margin.
I am using benchmark code from #mrdwab and the initialization is as follows:
set.seed(1)
full <- data.frame(
dt = 1:15000000,
val = floor(rnorm(15000000,0,1))
)
test <- data.frame(dt = floor(runif(24,1,15000000)))
i <- 500
The benchmarks are:
test replications elapsed relative
2 mrdwab 2 1.31 1.00000
3 spacedman 2 69.06 52.71756
1 andrie 2 93.68 71.51145
4 original 2 114.24 87.20611
Totally unexpected. Mind = blown. Can someone please shed some light in this dark corner and enlighten as to what is happening.
Important: As #mrdwab notes below, his solution works only if the vectors are integers. If not, #spacedman has the right solution
Here's a real R way to do it. Functionally. No loops...
Starting with Andrie's example data.
First, an interval comparison function:
> cf = function(l,u){force(l);force(u);function(x){x>l & x<u}}
An OR composition function:
> OR = function(f1,f2){force(f1);force(f2);function(x){f1(x)|f2(x)}}
Now there's sort of a loop here, to construct a list of those comparison functions:
> funs = mapply(cf,test$dt-i,test$dt+i)
Now combine all those into one function:
> anyF = Reduce(OR,funs)
And now we apply the OR composition to our interval testing functions:
> head(full[anyF(full$dt),])
dt val
3 3 -0.83562861
4 4 1.59528080
5 5 0.32950777
6 6 -0.82046838
7 7 0.48742905
26 26 -0.05612874
What you've got now is a function of a single variable that tests if the value is in the ranges you defined.
> anyF(1:10)
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
I don't know if this is faster, or better, or what. Someone do some benchmarks!
I don't know if it's any more efficient, but I would think you could also do something like this to get what you want:
subs <- apply(test, 1, function(x) c((x-2):(x+2)))
full[which(full$dt %in% subs), ]
I had to adjust your "3" to "2" since x would be included both ways.
Benchmarking (just for fun)
#Spacedman leads the way!
First, the required data and functions.
## Data
set.seed(1)
full <- data.frame(
dt = 1:200,
val = rnorm(200,0,1)
)
test <- data.frame(dt = seq(5,200,by=23))
i <- 3
## Spacedman's functions
cf = function(l,u){force(l);force(u);function(x){x>l & x<u}}
OR = function(f1,f2){force(f1);force(f2);function(x){f1(x)|f2(x)}}
funs = mapply(cf,test$dt-i,test$dt+i)
anyF = Reduce(OR,funs)
Second, the benchmarking.
## Benchmarking
require(rbenchmark)
benchmark(andrie = do.call(rbind,
lapply(test$dt,
function(j) full[full$dt > (j-i) &
full$dt < (j+i), ])),
mrdwab = {subs <- apply(test, 1,
function(x) c((x-(i-1)):(x+(i-1))))
full[which(full$dt %in% subs), ]},
spacedman = full[anyF(full$dt),],
original = {subs <- data.frame()
for (j in test$dt)
subs <- rbind(subs,
subset(full, full$dt > (j-i) &
full$dt < (j+i)))},
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
# test replications elapsed relative
# 3 spacedman 100 0.064 1.000000
# 2 mrdwab 100 0.105 1.640625
# 1 andrie 100 0.520 8.125000
# 4 original 100 1.080 16.875000
There is nothing inherently wrong with your code. To achieve your aim, you need a loop of some sort around a vectorised subset operation.
But here is more R-ish way to do it, which might well be faster:
do.call(rbind,
lapply(test$dt, function(j)full[full$dt > (j-i) & full$dt < (j+i), ])
)
PS: You can significantly simplify your reproducible example:
set.seed(1)
full <- data.frame(
dt = 1:200,
val = rnorm(200,0,1)
)
test <- data.frame(dt = seq(5,200,by=23))
i <- 3
xx <- do.call(rbind,
lapply(test$dt, function(j)full[full$dt > (j-i) & full$dt < (j+i), ])
)
head(xx)
dt val
3 3 -0.83562861
4 4 1.59528080
5 5 0.32950777
6 6 -0.82046838
7 7 0.48742905
26 26 -0.05612874
one more way using data.tables:
{
temp <- data.table(x=unique(c(full$dt,(test$dt-i),(test$dt+i))),key="x")
temp[,index:=1:nrow(temp)]
startpoints <- temp[J(test$dt-i),index]$index
endpoints <- temp[J(test$dt+i),index]$index
allpoints <- as.vector(mapply(FUN=function(x,y) x:y,x=startpoints,y=endpoints))
setkey(x=temp,index)
ans <- temp[J(allpoints)]$x
}
benchmarks:
number of rows in test:9
number of rows in full:10000
test replications elapsed relative
1 spacedman 100 0.406 1.000
2 new 100 1.179 2.904
number of rows in full:100000
test replications elapsed relative
2 new 100 2.374 1.000
1 spacedman 100 3.753 1.581
Say I have a data frame like this:
Df <- data.frame(
V1 = c(1,2,3,NA,5),
V2 = c(1,2,NA,4,5),
V3 = c(NA,2,NA,4,NA)
)
Now I want to count the number of valid observations for every combination of two variables. For that, I wrote a function sharedcount:
sharedcount <- function(x,...){
nx <- names(x)
alln <- combn(nx,2)
out <- apply(alln,2,
function(y)sum(complete.cases(x[y]))
)
data.frame(t(alln),out)
}
This gives the output:
> sharedcount(Df)
X1 X2 out
1 V1 V2 3
2 V1 V3 1
3 V2 V3 2
All fine, but the function itself takes pretty long on big dataframes (600 variables and about 10000 observations). I have the feeling I'm overseeing an easier approach, especially since cor(...,use='pairwise') is running still a whole lot faster while it has to do something similar :
> require(rbenchmark)
> benchmark(sharedcount(TestDf),cor(TestDf,use='pairwise'),
+ columns=c('test','elapsed','relative'),
+ replications=1
+ )
test elapsed relative
2 cor(TestDf, use = "pairwise") 0.25 1.0
1 sharedcount(TestDf) 1.90 7.6
Any tips are appreciated.
Note : Using Vincent's trick, I wrote a function that returns the same data frame. Code in my answer below.
The following is slightly faster:
x <- !is.na(Df)
t(x) %*% x
# test elapsed relative
# cor(Df) 12.345 1.000000
# t(x) %*% x 20.736 1.679708
I thought Vincent's looked really elegant, not to mention being faster than my sophomoric for-loop, except it seems to be needing an extraction step which I added below. This is just an example of the heavy overhead in the apply method when used with dataframes.
shrcnt <- function(Df) {Comb <- t(combn(1:ncol(Df),2) )
shrd <- 1:nrow(Comb)
for (i in seq_len(shrd)){
shrd[i] <- sum(complete.cases(Df[,Comb[i,1]], Df[,Comb[i,2]]))}
return(shrd)}
benchmark(
shrcnt(Df), sharedcount(Df), {prs <- t(x) %*% x; prs[lower.tri(prs)]},
cor(Df,use='pairwise'),
columns=c('test','elapsed','relative'),
replications=100
)
#--------------
test elapsed relative
3 { 0.008 1.0
4 cor(Df, use = "pairwise") 0.020 2.5
2 sharedcount(Df) 0.092 11.5
1 shrcnt(Df) 0.036 4.5
Based on the lovely trick of Vincent and the additional lower.tri() suggestion of DWin, I came up with following function that gives me the same output (i.e. a data frame) as my original one, and runs a whole lot faster :
sharedcount2 <- function(x,stringsAsFactors=FALSE,...){
counts <- crossprod(!is.na(x))
id <- lower.tri(counts)
count <- counts[id]
X1 <- colnames(counts)[col(counts)[id]]
X2 <- rownames(counts)[row(counts)[id]]
data.frame(X1,X2,count)
}
Note the use of crossprod(), as that one gives a small improvement compared to %*%, but it does exactly the same.
The timings :
> benchmark(sharedcount(TestDf),sharedcount2(TestDf),
+ replications=5,
+ columns=c('test','replications','elapsed','relative'))
test replications elapsed relative
1 sharedcount(TestDf) 5 10.00 90.90909
2 sharedcount2(TestDf) 5 0.11 1.00000
Note: I supplied TestDf in the question, as I noticed that the timings differ depending on the size of the data frames. As shown here, the time increase is a lot more dramatic than when compared using a small data frame.
I'm trying to speed up/vectorize some calculations in a time series.
Can I vectorize a calculation in a for loop which can depend on results from an earlier iteration? For example:
z <- c(1,1,0,0,0,0)
zi <- 2:6
for (i in zi) {z[i] <- ifelse (z[i-1]== 1, 1, 0) }
uses the z[i] values updated in earlier steps:
> z
[1] 1 1 1 1 1 1
In my effort at vectorizing this
z <- c(1,1,0,0,0,0)
z[zi] <- ifelse( z[zi-1] == 1, 1, 0)
the element-by-element operations don't use results updated in the operation:
> z
[1] 1 1 1 0 0 0
So this vectorized operation operates in 'parallel' rather than iterative fashion. Is there a way I can write/vectorize this to get the results of the for loop?
ifelse is vectorized and there's a bit of a penalty if you're using it on one element at a time in a for-loop. In your example, you can get a pretty good speedup by using if instead of ifelse.
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
z <- c(1,1,0,0,0,0)
identical(fun1(z),fun2(z))
# [1] TRUE
system.time(replicate(10000, fun1(z)))
# user system elapsed
# 1.13 0.00 1.32
system.time(replicate(10000, fun2(z)))
# user system elapsed
# 0.27 0.00 0.26
You can get some additional speed gains out of fun2 by compiling it.
library(compiler)
cfun2 <- cmpfun(fun2)
system.time(replicate(10000, cfun2(z)))
# user system elapsed
# 0.11 0.00 0.11
So there's a 10x speedup without vectorization. As others have said (and some have illustrated) there are ways to vectorize your example, but that may not translate to your actual problem. Hopefully this is general enough to be applicable.
The filter function may be useful to you as well if you can figure out how to express your problem in terms of a autoregressive or moving average process.
This is a nice and simple example where Rcpp can shine.
So let us first recast functions 1 and 2 and their compiled counterparts:
library(inline)
library(rbenchmark)
library(compiler)
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun1c <- cmpfun(fun1)
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
fun2c <- cmpfun(fun2)
We write a Rcpp variant very easily:
funRcpp <- cxxfunction(signature(zs="numeric"), plugin="Rcpp", body="
Rcpp::NumericVector z = Rcpp::NumericVector(zs);
int n = z.size();
for (int i=1; i<n; i++) {
z[i] = (z[i-1]==1.0 ? 1.0 : 0.0);
}
return(z);
")
This uses the inline package to compile, load and link the five-liner on the fly.
Now we can define our test-date, which we make a little longer than the original (as just running the original too few times result in unmeasurable times):
R> z <- rep(c(1,1,0,0,0,0), 100)
R> identical(fun1(z),fun2(z),fun1c(z),fun2c(z),funRcpp(z))
[1] TRUE
R>
All answers are seen as identical.
Finally, we can benchmark:
R> res <- benchmark(fun1(z), fun2(z),
+ fun1c(z), fun2c(z),
+ funRcpp(z),
+ columns=c("test", "replications", "elapsed",
+ "relative", "user.self", "sys.self"),
+ order="relative",
+ replications=1000)
R> print(res)
test replications elapsed relative user.self sys.self
5 funRcpp(z) 1000 0.005 1.0 0.01 0
4 fun2c(z) 1000 0.466 93.2 0.46 0
2 fun2(z) 1000 1.918 383.6 1.92 0
3 fun1c(z) 1000 10.865 2173.0 10.86 0
1 fun1(z) 1000 12.480 2496.0 12.47 0
The compiled version wins by a factor of almost 400 against the best R version, and almost 100 against its byte-compiled variant. For function 1, the byte compilation matters much less and both variants trail the C++ by a factor of well over two-thousand.
It took about one minute to write the C++ version. The speed gain suggests it was a minute well spent.
For comparison, here is the result for the original short vector called more often:
R> z <- c(1,1,0,0,0,0)
R> res2 <- benchmark(fun1(z), fun2(z),
+ fun1c(z), fun2c(z),
+ funRcpp(z),
+ columns=c("test", "replications",
+ "elapsed", "relative", "user.self", "sys.self"),
+ order="relative",
+ replications=10000)
R> print(res2)
test replications elapsed relative user.self sys.self
5 funRcpp(z) 10000 0.046 1.000000 0.04 0
4 fun2c(z) 10000 0.132 2.869565 0.13 0
2 fun2(z) 10000 0.271 5.891304 0.27 0
3 fun1c(z) 10000 1.045 22.717391 1.05 0
1 fun1(z) 10000 1.202 26.130435 1.20 0
The qualitative ranking is unchanged: the Rcpp version dominates, function2 is second-best. with the byte-compiled version being about twice as fast that the plain R variant, but still almost three times slower than the C++ version. And the relative difference are lower: relatively speaking, the function call overhead matters less and the actual looping matters more: C++ gets a bigger advantage on the actual loop operations in the longer vectors. That it is an important result as it suggests that more real-life sized data, the compiled version may reap a larger benefit.
Edited to correct two small oversights in the code examples. And edited again with thanks to Josh to catch a setup error relative to fun2c.
I think this is cheating and not generalizable, but: according to the rules you have above, any occurrence of 1 in the vector will make all subsequent elements 1 (by recursion: z[i] is 1 set to 1 if z[i-1] equals 1; therefore z[i] will be set to 1 if z[i-2] equals 1; and so forth). Depending on what you really want to do, there may be such a recursive solution available if you think carefully about it ...
z <- c(1,1,0,0,0,0)
first1 <- min(which(z==1))
z[seq_along(z)>first1] <- 1
edit: this is wrong, but I'm leaving it up to admit my mistakes. Based on a little bit of playing (and less thinking), I think the actual solution to this recursion is more symmetric and even simpler:
rep(z[1],length(z))
Test cases:
z <- c(1,1,0,0,0,0)
z <- c(0,1,1,0,0,0)
z <- c(0,0,1,0,0,0)
Check out the rollapply function in zoo.
I'm not super familiar with it, but I think this does what you want:
> c( 1, rollapply(z,2,function(x) x[1]) )
[1] 1 1 1 1 1 1
I'm sort of kludging it by using a window of 2 and then only using the first element of that window.
For more complicated examples you could perform some calculation on x[1] and return that instead.
Sometimes you just need to think about it totally differently. What you're doing is creating a vector where every item is the same as the first if it's a 1 or 0 otherwise.
z <- c(1,1,0,0,0,0)
if (z[1] != 1) z[1] <- 0
z[2:length(z)] <- z[1]
There is a function that does this particular calculation: cumprod (cumulative product)
> cumprod(z[zi])
[1] 1 0 0 0 0
> cumprod(c(1,2,3,4,0,5))
[1] 1 2 6 24 0 0
Otherwise, vectorize with Rccp as other answers have shown.
It's also possible to do this with "apply" using the original vector and a lagged version of the vector as the constituent columns of a data frame.