Surprisingly Slow Standard Deviation in R - r

I am calculating standard deviations on an expanding window where at each point I recalculate the standard deviation. This seems like a fairly straightforward thing to do that should be relatively fast. However, it takes a lot longer than you might think (~45 seconds). Am I missing something here? In Matlab this is quite fast.
t0 <- proc.time()[[3]]
z <- rep(0, 7000)
x <- rnorm(8000)
for(i in 1000:8000){
## print(i)
z[i] <- sd(x[1:i])
}
print(proc.time()[[3]]- t0)

You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.
n <- length(x)
m <- cumsum(x)/(1:n)
m1 <- c(NA,m[1:(n-1)])
ssd <- (x-m)*(x-m1)
v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
z <- sqrt(v)
See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.
Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r

Edited to fix some typos, sorry.
This takes ~1.3 seconds on my machine:
t0 <- proc.time()[[3]]
x <- rnorm(8000)
z <- sapply(1000:8000,function(y){sd(x[seq_len(y)])})
print(proc.time()[[3]]- t0)
and I'd be willing to bet there are even faster ways of doing this. Avoid explicit for loops!

When a somewhat similar question about a cumulative variance and a cumularive kurtosis operation came up in rhelp a few days ago, here is what I offered :
daily <- rnorm(1000000)
mbar <- mean(daily)
cumvar <- cumsum( (daily-cumsum(daily)/1:length(daily) )^2)
cumskew <- cumsum( (daily-cumsum(daily)/1:length(daily))^3)/cumvar^(3/2)
It's certainly faster than the sapply method but may be comparable to Aaron's.
system.time( cumvar <- cumsum( (daily-cumsum(daily)/1:length(daily) )^2) )
user system elapsed
0.037 0.026 0.061
system.time(cumsd <- sqrt(cumvar) )
user system elapsed
0.009 0.005 0.013

Related

Monte Carlo simulation of correlation between two Brownian motion (continuous random walk)

y <- cumsum(rnorm(100,0,1)) # random normal, with small (1.0) drift.
y.ts <- ts(y)
x <- cumsum(rnorm(100,0,1))
x
x.ts <- ts(x)
ts.plot(y.ts,ty= "l", x.ts) # plot the two random walks
Regression.Q1 = lm(y~x) ; summary(lm2)
summary(Regression.Q1)
t.test1 <- (summary(Regression.Q1)$coef[2,3]) # T-test computation
y[t] = y[t-1] + epsilon[t]
epsilon[t] ~ N(0,1)
set.seed(1)
t=1000
epsilon=sample(c(-1,1), t, replace = 1) # Generate k random walks across time {0, 1, ... , T}
N=T=1e3
y=t(apply(matrix(sample(c(-1,1),N*T,rep=TRUE),ncol=T),1,cumsum))
y[1]<-0
for (i in 2:t) {
y[i]<-y[i-1]+epsilon[i]
}
I need to:
Repeat the process 1,000 times (Monte Carlo simulations), namely build a loop around the previous program and each time save the t statistics. You will have a sequence of 1;000 t-tests : S = (t-test1, t-test2, ... , t-test1000). Count the number of time the absolute value of the 1,000 t-tests > 1.96, the critical value at a 5% significance level. If the series were I(0) you would have found roughly 5%. It won't be the case here (spurious regression).
What do I need to add to save the respective coefficients ?
Your posted code related to y[t] = y[t-1] + epsilon[t] is not real working code, but I can see that you are trying to store all 1000 * 2 random walk. Actually there is no need to do this. We only care about t-score rather than what those realizations of random walk are.
For this kind of problem, where we aim to replicate a procedure a lot of times, it is handy to first write a function to execute such a procedure for a single time. You already had good working code for this; we just need to wrap it in a function (removing those unnecessary part like plot):
sim <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
coef(summary(lm(y ~ x)))[2,3]
}
This function takes no input; it only returns the t-score for one experiment.
Now, we are going to repeat this 1000 times. We can write a for loop, but function replicate is easier (read ?replicate if necessary)
S <- replicate(1000, sim())
Note this will take some time, much slower than it should be for such a simple task, because both lm and summary.lm are slow. A much faster way will be shown later.
Now, S is vector with 1000 values, which is the "a sequence of 1000 t-tests" you want. To get "the number of time the absolute value of the 1,000 t-tests > 1.96", we can just use
sum(abs(S) > 1.96)
# [1] 756
The result 756 is just what I get; you will get something different as the simulation is random. But it will always be quite a large number as expected.
A faster version of sim:
fast_sim <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
y <- y - mean(y)
x <- x - mean(x)
xty <- crossprod(x,y)[1]
xtx <- crossprod(x)[1]
b <- xty / xtx
sigma <- sqrt(sum((y - x * b) ^ 2) / 98)
b * sqrt(xtx) * sigma
}
This function computes simple linear regression without lm, and t-score without summary.lm.
S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 778
An alternative way is to use cor.test:
fast_sim2 <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
unname(cor.test(x, y)[[1]])
}
S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 775
Let's have a benchmark:
system.time(replicate(1000, sim()))
# user system elapsed
# 1.860 0.004 1.867
system.time(replicate(1000, fast_sim()))
# user system elapsed
# 0.088 0.000 0.090
system.time(replicate(1000, fast_sim2()))
# user system elapsed
# 0.308 0.004 0.312
cor.test is much faster than lm + summary.lm, but manual computation is even faster!

Stock Price Simulation R code - Slow - Monte Carlo

I need to perform a stock price simulation using R code. The problem is that the code is a little bit slow.
Basically I need to simulate the stock price for each time step (daily) and store it in a matrix.
An example assuming the stock process is Geometric Brownian Motion
for(j in 1:100000){
for(i in 1:252){
S[i] <- S[i-1]*exp((r-v^2/2)*dt+v*sqrt(dt)*rnorm(1))
}
U[j,] <- S
}
Any suggestion to improve and speed up the code?
Assuming S[0] = 1, you can build U as a follows:
Ncols <- 252
Nrows <- 100000
U <- matrix(exp((r-v^2/2)*dt+v*sqrt(dt)*rnorm(Ncols*Nrows)), ncol=Ncols, nrow=Nrows)
U <- do.call(rbind, lapply(1:Nrows, function(j)cumprod(U[j,])))
EDIT: using Joshua's and Ben's suggestions:
product version:
U <- matrix(exp((r-v^2/2)*dt+v*sqrt(dt)*rnorm(Ncols*Nrows)), ncol=Ncols, nrow=Nrows)
U <- t(apply(U, 1, cumprod))
sum version:
V <- matrix((r-v^2/2)*dt+v*sqrt(dt)*rnorm(Ncols*Nrows), ncol=Ncols, nrow=Nrows)
V <- exp( t(apply(V, 1, cumsum)) )
EDIT: as suggested by #Paul:
Execution time for each proposal (using 10000 rows instead of 10^5):
Using apply + cumprod
user system elapsed
0.61 0.01 0.62
Using apply + cumsum
user system elapsed
0.61 0.02 0.63
Using OP's original code
user system elapsed
67.38 0.00 67.52
Notes: The times shown above are the third measures of system.time. The first two measures for each code were discarded. I've used r <- sqrt(2), v <- sqrt(3) and dt <- pi. In his original code, I've also replaced S[i-1] for ifelse(i==1,1,S[i-1]), and preallocated U.

Speeding up time series simulation (for bootstrap)

I need to run a bootstrap on a time series with non-standard dependence. So to do this I need to create a function that simulates the time series by making time by time adjustments.
testing<-function(){
sampleData<-as.zoo(data.frame(index=1:1000,vol=(rnorm(1000))^2,x=NA))
sampleData[,"x"]<-sampleData[,"vol"]+rnorm(1000) #treat this is completely exognenous and unknown in connection to vol
sampleData<-cbind(sampleData,mean=rollmean(sampleData[,"vol"],k=3,align="right"))
sampleData<-cbind(sampleData,vol1=lag(sampleData[,"vol"],k=-1),x1=lag(sampleData[,"x"],k=-1),mean1=lag(sampleData[,"mean"],k=-1))
#get estimate
mod<-lm(vol~vol1+x1+mean1,data=sampleData)
res<-mod$residuals
for(i in 5:1000){
#recursively estimate
sampleData[i,"vol"]<-as.numeric(predict(mod,newdata=data.frame(sampleData[i-1,])))+res[i-3]
#now must update other paramaters
#first our rolled average
sampleData[i,"mean"]<-mean(sampleData[(i-3):i,"vol"])
#reupdate our lagged variables
sampleData[i,"vol1"]<-sampleData[i-1,"vol"]
sampleData[i,"mean1"]<-sampleData[i-1,"mean"]
}
lm(vol~vol1+x1+mean1,data=sampleData)
}
When I run this code and measure the run time I get
system.time(testing())
user system elapsed
2.711 0.201 2.915
This is a slight problem for me as a will be integrating this code to construct a bootstrap. This means any time taken here is multiplied by about 100 for each step. And I am updating this at a few thousand times. That means a single run will take hours (to days) to run.
Is there anyway to speed this code up?
Kind regards,
Matthew
Here's how to avoid the overhead of predict.lm. Also note that I used a matrix instead of a zoo object, which would be a tiny bit slower. You can see just how much this slowed down your code. That's the price you pay for convenience.
testing.jmu <- function() {
if(!require(xts)) stop("xts package not installed")
set.seed(21) # for reproducibility
sampleData <- .xts(data.frame(vol=(rnorm(1000))^2,x=NA), 1:1000)
sampleData$x <- sampleData$vol+rnorm(1000)
sampleData$mean <- rollmean(sampleData$vol, k=3, align="right")
sampleData$vol1 <- lag(sampleData$vol,k=1)
sampleData$x1 <- lag(sampleData$x,k=1)
sampleData$mean1 <- lag(sampleData$mean,k=1)
sampleMatrix <- na.omit(cbind(as.matrix(sampleData),constant=1))
mod.fit <- lm.fit(sampleMatrix[,c("constant","vol1","x1","mean1")],
sampleMatrix[,"vol"])
res.fit <- mod.fit$residuals
for(i in 5:nrow(sampleMatrix)){
sampleMatrix[i,"vol"] <-
sum(sampleMatrix[i-1,c("constant","vol1","x1","mean1")] *
mod.fit$coefficients)+res.fit[i-3]
sampleMatrix[i,"mean"] <- mean(sampleMatrix[(i-3):i,"vol"])
sampleMatrix[i,c("vol1","mean1")] <- sampleMatrix[i-1,c("vol","mean")]
}
lm.fit(sampleMatrix[,c("constant","vol1","x1","mean1")], sampleMatrix[,"vol"])
}
system.time(out <- testing.jmu())
# user system elapsed
# 0.05 0.00 0.05
coef(out)
# constant vol1 x1 mean1
# 1.08787779 -0.06487441 0.03416802 -0.02757601
Add the set.seed(21) call to your function and you'll see that my function returns the same coefficients as yours.

Is there an efficient way to parallelize mapply?

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)
After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package foreach isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be multicore for Unix machines. In particular, the pvec function seems to be the most efficient one after I checked the help page.
However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a data.frame instead), just like the mapply function does. Is there anything out there that I missed?
Here is a small example of what I want to do: (Note that I include a plyr example here because it can be an alternative to the base mapply function and it has a parallelize option. However, it is slower in my implementation and internally, it calls foreach to parallelize, so I think it won't help. Is that correct?)
library(plyr)
library(foreach)
n <- 10000
df <- data.frame(P = rnorm(n, mean=100, sd=10),
B0 = rnorm(n, mean=40, sd=5),
CF1 = rnorm(n, mean=30, sd=10),
CF2 = rnorm(n, mean=30, sd=5),
CF3 = rnorm(n, mean=90, sd=8))
get_uniroot <- function(P, B0, CF1, CF2, CF3) {
uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3},
lower = 1,
upper = 10,
tol = 0.00001)$root
}
system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))
#user system elapsed
#0.91 0.00 0.90
system.time(x2 <- mdply(df, get_uniroot))
#user system elapsed
#5.85 0.00 5.85
system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {
get_uniroot(P, B0, CF1, CF2, CF3)})
#user system elapsed
# 10.30 0.00 10.36
all.equal(x1, x2$V1) #TRUE
all.equal(x1, x3) #TRUE
Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of doMC part, because I couldn't install it. His example works, though, even after adjusting his function.),
but didn't get it to work. However, since it uses foreach, I thought the same arguments mentioned above apply, so I didn't try it too long.
#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))
#Error in { : task 1 failed - "invalid function value in 'zeroin'"
PS: I know that I could just increase tol to reduce the number of steps that are necessary to find a uniroot. However, I already set tol as big as possible.
I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:
dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
function(x) do.call(get_uniroot,as.list(dfm[x,])),
mc.cores=4L
)
unlist(result)
This is basically doing the same mapply does, but in a parallel way.
But...
Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:
ncores <- 4
id <- floor(
quantile(0:nrow(df),
1-(0:ncores)/ncores
)
)
idm <- embed(id,2)
mapply_uniroot <- function(id){
tmp <- df[(id[1]+1):id[2],]
mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
function(x) mapply_uniroot(idm[x,]),
mc.cores=ncores)
final <- unlist(result)
This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :
> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE
it's an old topic but fyi you now have parallel::mcmapply doc is here. don't forget to set mc.cores in the options. I usually use mc.cores=parallel::detectCores()-1 to let one cpu free for OS operations.
x4 <- mcmapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3,mc.cores=parallel::detectCores()-1)
This isn't exactly a best practices suggestion, but considerable speed-up can be had by identifying the root for all parameters in a 'vectorized' fashion. For instance,
bisect <-
function(f, interval, ..., lower=min(interval), upper=max(interval),
f.lower=f(lower, ...), f.upper=f(upper, ...), maxiter=20)
{
nrow <- length(f.lower)
bounds <- matrix(c(lower, upper), nrow, 2, byrow=TRUE)
for (i in seq_len(maxiter)) {
## move lower or upper bound to mid-point, preserving opposite signs
mid <- rowSums(bounds) / 2
updt <- ifelse(f(mid, ...) > 0, 0L, nrow) + seq_len(nrow)
bounds[updt] <- mid
}
rowSums(bounds) / 2
}
and then
> system.time(x2 <- with(df, {
+ f <- function(x, PB0, CF1, CF2, CF3)
+ PB0 + CF1/x + CF2/x^2 + CF3/x^3
+ bisect(f, c(1, 10), PB0, CF1, CF2, CF3)
+ }))
user system elapsed
0.180 0.000 0.181
> range(x1 - x2)
[1] -6.282406e-06 6.658593e-06
versus about 1.3s for application of uniroot separately to each. This also combined P and B0 into a single value ahead of time, since that is how they enter the equation.
The bounds on the final value are +/- diff(interval) * (.5 ^ maxiter) or so. A fancier implementation would replace bisection with linear or quadratic interpolation (as in the reference cited in ?uniroot), but then uniform efficient convergence (and in all cases error handling) would be more tricky to arrange.

Computing rolling sums of stretches a vector with R

I have a long vector x, and another v, which contains lengths. I would like to sum x so that the answer y is a vector of length length(v), and y[1] is sum(x[1:v[i]]), y[2] is sum(x[(1+v[1]):(v[1]+v[2])]), and so on. Essentially this is performing sparse matrix multiplication from a space of dimension length(x) to one of dimension length(v). However, I would prefer not to bring in "advanced machinery", although I might have to. It does need to be very, very fast. Can anyone think of anything simpler than using a sparse matrix package?
Example -
x <- c(1,1,3,4,5)
v <- c(2,3)
y <- myFunc(x,v)
y should be c(2,12)
I am open to any pre-processing - e.g, storing in v the starting indexes of each stretch.
y <- cumsum(x)[cumsum(v)]
y <- c(y[1], diff(y))
This looks like it's doing extra work because it's computing the cumsum for the whole vector, but it's actually faster than the other solutions so far, for both small and large numbers of groups.
Here's how I simulated the data
set.seed(5)
N <- 1e6
n <- 10
x <- round(runif(N,0,100),1)
v <- as.vector(table(sample(n, N, replace=TRUE)))
On my machine the timings with n <- 10 are:
Brandon Bertelsen (for loop): 0.017
Ramnath (rowsum): 0.057
John (split/apply): 0.280
Aaron (cumsum): 0.008
changing to n <- 1e5 the timings are:
Brandon Bertelsen (for loop): 2.181
Ramnath (rowsum): 0.226
John (split/apply): 0.852
Aaron (cumsum): 0.015
I suspect this is faster than doing matrix multiplication, even with a sparse matrix package, because one doesn't have to form the matrix or do any multiplication. If more speed is needed, I suspect it could be sped up by writing it in C; not hard to do with the inline and rcpp packages, but I'll leave that to you.
You can do this using rowsum. It should be reasonably fast as it uses C code in the background.
y <- rowsum(x, rep(1:length(v), v))
Here's a slightly different tack.
s <- rep(1:length(v), v)
l <- split(x, s)
y <- sapply(l, sum)
Try something like:
for (i in 1:length(v)) {
y[i] <- ifelse(i > 1,sum(x[v[i-1]:v[i]]), sum(x[1:v[i]]))
}

Resources