I am working on a research project where I want to determine equivalence of two distributions. I am currently using the Mann-Whitney Test for Equivalence and the code I am running (below) was provided with the book Testing Statistical Hypotheses of Equivalence and Noninferiority by Stefan Wellek (2010). Before running my data I am testing this code with random normal distributions which have the same mean and standard deviation. My problem is that there are three nested for loops and when running larger distributions sizes (as in the example below) the code takes forever to run. If I only had to run it once that would not be such a problem, but I am doing a simulation test and creating power curves so I need to run many iterations of this code (around 10,000). At the moment, depending on how I alter the distribution sizes, it takes days to run 10,000 iterations.
Any help in a way to increase the performance of this would be greatly appreciated.
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)
alpha <- 0.05
m <- length(x)
n <- length(y)
eps1_ <- 0.2 #0.1382 default
eps2_ <- 0.2 #0.2602 default
eqctr <- 0.5 + (eps2_-eps1_)/2
eqleng <- eps1_ + eps2_
wxy <- 0
pihxxy <- 0
pihxyy <- 0
for (i in 1:m)
for (j in 1:n)
wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))
for (i in 1:m)
for (j1 in 1:(n-1))
for (j2 in (j1+1):n)
pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))
for (i1 in 1:(m-1))
for (i2 in (i1+1):m)
for (j in 1:n)
pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))
wxy <- wxy / (m*n)
pihxxy <- pihxxy*2 / (m*(m-1)*n)
pihxyy <- pihxyy*2 / (n*(n-1)*m)
sigmah <- sqrt((wxy-(m+n-1)*wxy**2+(m-1)*pihxxy+(n-1)*pihxyy)/(m*n))
crit <- sqrt(qchisq(alpha,1,(eqleng/2/sigmah)**2))
if (abs((wxy-eqctr)/sigmah) >= crit) rej <- 1
if (abs((wxy-eqctr)/sigmah) < crit) rej <- 0
if (is.na(sigmah) || is.na(crit)) rej <- 1
MW_Decision <- rej
cat(" ALPHA =",alpha," M =",m," N =",n," EPS1_ =",eps1_," EPS2_ =",eps2_,
"\n","WXY =",wxy," SIGMAH =",sigmah," CRIT =",crit," REJ=",MW_Decision)
See edit below for an even better suggestion
One simple suggestion to get a bit of a speed boost is to byte compile your code.
For example, I wrapped your code into a function starting from the alpha <- 0.05 line and ran it on my laptop. Simply byte compiling your current code, it runs twice as fast.
set.seed(1234)
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)
# f1 <- function(x,y){ ...your code...}
system.time(f1(x, y))
# user system elapsed
# 33.249 0.008 33.278
library(compiler)
f2 <- cmpfun(f1)
system.time(f2(x, y))
# user system elapsed
# 17.162 0.002 17.170
EDIT
I should add, this is the type of things that a different language would do much better than R. Have you looked at the Rcpp and the inline packages?
I've been curious to learn how to use them so I figured this was a good chance.
Here's a tweak of your code using the inline package and Fortran (since I'm more comfortable with that than C). It wasn't hard at all (provided you know Fortran or C); I just followed the examples listed in cfunction.
First, let's re-write your loops and compile them:
library(inline)
# Fortran code for first loop
loop1code <- "
integer i, j1, j2
real*8 tmp
do i = 1, m
do j1 = 1, n-1
do j2 = j1+1, n
tmp = x(i) - max(y(j1),y(j2))
if (tmp > 0.) pihxyy = pihxyy + 1
end do
end do
end do
"
# Compile the code and turn loop into a function
loop1fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxyy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop1code, language="F95")
# Fortran code for second loop
loop2code <- "
integer i1, i2, j
real*8 tmp
do i1 = 1, m-1
do i2 = i1+1, m
do j = 1, n
tmp = min(x(i1), x(i2)) - y(j)
if (tmp > 0.) pihxxy = pihxxy + 1
end do
end do
end do
"
# Compile the code and turn loop into a function
loop2fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxxy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop2code, language="F95")
Now let's create a new function that uses these. So it's not too long, I'll just sketch the key parts I modified from your code:
f3 <- function(x, y){
# ... code ...
# Remove old loop
## for (i in 1:m)
## for (j1 in 1:(n-1))
## for (j2 in (j1+1):n)
## pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))
# Call new function from compiled code instead
pihxyy <- loop1fun(x, y, pihxyy, m, n)$pihxyy
# Remove second loop
## for (i1 in 1:(m-1))
## for (i2 in (i1+1):m)
## for (j in 1:n)
## pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))
# Call new compiled function for second loop
pihxxy <- loop2fun(x, y, pihxxy, m, n)$pihxxy
# ... code ...
}
And now we run it and voila, we get a huge speed boost! :)
system.time(f3(x, y))
# user system elapsed
0.12 0.00 0.12
I did check that it got the same results as your code, but you probably want to run some additional tests just in case.
You can use outer instead of the first double loop:
set.seed(42)
f1 <- function(x,y) {
wxy <- 0
for (i in 1:m)
for (j in 1:n)
wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))
wxy
}
f2 <- function(x,y) sum(outer(x,y, function(x,y) trunc(0.5*(sign(x-y)+1))))
f1(x,y)
[1] 32041
f2(x,y)
[1] 32041
You get roughly 50x speedup:
library(microbenchmark)
microbenchmark(f1(x,y),f2(x,y))
Unit: milliseconds
expr min lq median uq max neval
f1(x, y) 138.223841 142.586559 143.642650 145.754241 183.0024 100
f2(x, y) 1.846927 2.194879 2.677827 3.141236 21.1463 100
The other loops are trickier.
Related
I wrote some R code for simulating random samples from a Poisson distribution, based on the description of an algorithm (see attached image). But my code does not seem to work correctly, because the generated random samples are of a different pattern compared with those generated by R's built-in rpois() function. Can anybody tell me what I did wrong and how to fix my function?
r.poisson <- function(n, l=0.5)
{
U <- runif(n)
X <- rep(0,n)
p=exp(-l)
F=p
for(i in 1:n)
{
if(U[i] < F)
{
X[i] <- i
} else
{
p=p*l/(i+1)
F=F+p
i=i+1
}
}
return(X)
}
r.poisson(50)
The output is very different from rpois(50, lambda = 0.5). The algorithm I followed is:
(Thank you for your question. Now I know how a Poisson random variable is simulated.)
You had a misunderstanding. The inverse CDF method (with recursive computation) you referenced is used to generate a single Poisson random sample. So you need to fix this function to produce a single number. Here is the correct function, commented to help you follow each step.
rpois1 <- function (lambda) {
## step 1
U <- runif(1)
## step 2
i <- 0
p <- exp(-lambda)
F <- p
## you need an "infinite" loop
## no worry, it will "break" at some time
repeat {
## step 3
if (U < F) {
X <- i
break
}
## step 4
i <- i + 1
p <- lambda * p / i ## I have incremented i, so it is `i` not `i + 1` here
F <- F + p
## back to step 3
}
return(X)
}
Now to get n samples, you need to call this function n times. R has a nice function called replicate to repeat a function many times.
r.poisson <- function (n, lambda) {
## use `replicate()` to call `rpois1` n times
replicate(n, rpois1(lambda))
}
Now we can make a reasonable comparison with R's own rpois.
x1 <- r.poisson(1000, lambda = 0.5)
x2 <- rpois(1000, lambda = 0.5)
## set breaks reasonably when making a histogram
xmax <- max(x1, x2) + 0.5
par(mfrow = c(1, 2))
hist(x1, main = "proof-of-concept-implementation", breaks = seq.int(-0.5, xmax))
hist(x2, main = "R's rpois()", breaks = seq.int(-0.5, xmax))
Remark:
Applaud jblood94 for exemplifying how to seek vectorization opportunity of an R loop, without converting everything to C/C++. R's rpois is coded in C, that is why it is fast.
A vectorized version will run much faster than a non-vectorized function using replicate. The idea is to iteratively drop the uniform random samples as i is incremented.
r.poisson1 <- function(n, l = 0.5) {
U <- runif(n)
i <- 0L
X <- integer(n)
p <- exp(-l)
F <- p
idx <- 1:n
while (length(idx)) {
bln <- U < F
X[idx[bln]] <- i
p <- l*p/(i <- i + 1L)
F <- F + p
idx <- idx[!bln]
U <- U[!bln]
}
X
}
#Zheyuan Li's non-vectorized functions:
rpois1 <- function (lambda) {
## step 1
U <- runif(1)
## step 2
i <- 0
p <- exp(-lambda)
F <- p
## you need an "infinite" loop
## no worry, it will "break" at some time
repeat {
## step 3
if (U < F) {
X <- i
break
}
## step 4
i <- i + 1
p <- lambda * p * i
F <- F + p
## back to step 3
}
return(X)
}
r.poisson2 <- function (n, lambda) {
## use `replicate()` to call `rpois1` n times
replicate(n, rpois1(lambda))
}
Benchmark:
microbenchmark::microbenchmark(r.poisson1(1e5),
r.poisson2(1e5, 0.5),
rpois(1e5, 0.5))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> r.poisson1(1e+05) 3.063202 3.129151 3.782200 3.225402 3.734600 18.377700 100
#> r.poisson2(1e+05, 0.5) 217.631002 244.816601 269.692648 267.977001 287.599251 375.910601 100
#> rpois(1e+05, 0.5) 1.519901 1.552300 1.649026 1.579551 1.620451 7.531401 100
I need to speed up my R-code. My bottleneck is a function that needs to use the choose function. It looks like this:
P_ni <- function(Pn,Pi,eta1,eta2,p,d=NA)
{
if(is.na(d)) d <- 1-p
if(Pn==Pi) output <- p^Pn
else
{
if(Pi==1)seq1 <- seq_len(Pn-1)
if(Pi>1)seq1 <- seq_len(Pn-1)[-seq_len(Pi-1)]
output <- sum(choose((Pn-Pi-1),c(seq1-Pi))*choose(Pn,seq1)*
(eta1/(eta1+eta2))^c(seq1-Pi)*
(eta2/(eta1+eta2))^c(Pn-seq1)*p^seq1*d^c(Pn-seq1)
)
}
return(output)
}
This function need to be called several times with different Pn and Pi. The Problem here is, that Pn and Pi only are able to take a single number and not work with vectors. This is caused by the choose()-function.
I do this with a for-loop at the moment and it works perfectly, but it is slow.
The for-loop looks like this:
for(i in 1:nrow(n_k_matrix_p))
{
n_k_matrix_p[i,4] <- P_ni(n_k_matrix_p[i,1],n_k_matrix_p[i,2],eta1,eta2,p)
}
To make it reproducible:
eta1 <- 10
eta2 <- 5
p <- 0.4
n_k_matrix <- expand.grid(c(1:20),c(1:20))
n_k_matrix <- n_k_matrix[n_k_matrix[,1] >=n_k_matrix[,2],]
n_k_matrix <- n_k_matrix[order(n_k_matrix[,1]),]
The n_k_matrix contains my numbers for Pn and Pi.
Unfortunately the loop is still faster than using apply.
Does anyone have any idea how to speed things up?
You can regroup or precompute some computations.
P_ni2 <- function(n, eta1, eta2, p, d = 1 - p) {
res <- matrix(0, n, n)
diag(res) <- p^seq_len(n)
C1 <- eta1 / eta2 * p / d
C2 <- eta2 / (eta1 + eta2) * d
C3 <- eta1 / (eta1 + eta2)
C2_n <- C2^seq_len(n)
C3_n <- C3^seq_len(n)
precomputed <- outer(0:n, 0:n, choose)
for (j in seq_len(n)) {
for (i in seq_len(j - 1)) {
seq1 <- seq(i, j - 1)
res[i, j] <- sum(
precomputed[j-i, seq1-i+1] * precomputed[j+1, seq1+1] * C1^seq1
) * C2_n[j] / C3_n[i]
}
}
res
}
Verif:
> system.time({
+ n_k_matrix[[3]] <- sapply(1:nrow(n_k_matrix), function(i) {
+ P_ni(n_k_matrix[i,1], n_k_matrix[i,2], eta1, eta2, p)
+ })
+ })
utilisateur système écoulé
11.799 0.000 11.797
> system.time({
+ test <- P_ni2(400, eta1, eta2, p)
+ n_k_matrix[[4]] <- test[as.matrix(n_k_matrix[, 2:1])]
+ })
utilisateur système écoulé
2.328 0.003 2.341
> all.equal(n_k_matrix[[3]], n_k_matrix[[4]])
[1] TRUE
Note that I first store the results in the upper triangle of a squared matrix. Then, I convert it in your data frame format (that you call a matrix by the way).
This solution is 5 times faster for n = 400. I think you could improve it by recoding the double-loop (only) in Rcpp.
The pricing of the Asian option is approximated, using Monte Carlo simulation, by:
delta <- 1/12
T <- 2
S0 <- 100
sigma <- 0.20
K <- 100
r <- 0.01
n <- 10^4
m <- T/delta
S <- S0
for(i in 1:n) {
for(j in 1:m) {
W <- rnorm(1)
Si <- S[length(S)]*exp((r-0.5*sigma^2)*delta + sigma*sqrt(delta)*W)
S <- c(S, Si)
}
Si.bar <- mean(S[-1])
Ci <- exp(-r*T)*max(Si.bar - K, 0)
}
mean(Ci)
The for(j in 1:m) for loop runs perfectly, I think... But when I run it n times, using for(i in 1:n) S gets smaller and smaller by n. It decreases to almost zero when n grows. This leads to a mean (Si.bar <- mean(S[-1]) well below the strike price, K= 100.
I can't figure out what is wrong with the two last lines of codes. I'm getting a value on the Asian call option of 0, due to the payoff function. The correct solution to this option is a value of approximately 7 (mean(Ci))
There's a couple of issues with your code. Firstly, it's inefficient in R to build a vector by repeated concatenation. Instead, you should allocate the vector up front and then assign to its members.
Secondly, as I understand it, the aim is to repeat the inner loop n times and store the output into members of a vector C before taking the mean. That's not what you're doing at the moment - each iteration of the outer loop makes S longer and overwrites Ci such that the last statement, mean(Ci) is meaningless.
Here's an amended version of the code. I've used plyr partly to make the code neater, and partly for its progress bar functionality.
library(plyr)
delta <- 1/12
T <- 2
S0 <- 100
sigma <- 0.20
K <- 100
r <- 0.01
n <- 10^4
m <- T/delta
S <- numeric(m + 1)
S[1] <- S0
asian_price <- function() {
for(j in 1:m) {
W <- rnorm(1)
S[j + 1] <- S[j] * exp((r - 0.5 * sigma^2) * delta + sigma * sqrt(delta) * W)
}
Si.bar <- mean(S[-1])
exp(-r * T) * max(Si.bar - K, 0)
}
C <- raply(n, asian_price(), .progress = "text")
mean(C)
# [1] 7.03392
Trying to wrap my mind arround vectorizing, trying to make some simulations faster I found this very basic epidemic simulation. The code is from the book http://www.amazon.com/Introduction-Scientific-Programming-Simulation-Using/dp/1420068725/ref=sr_1_1?ie=UTF8&qid=1338069156&sr=8-1
#program spuRs/resources/scripts/SIRsim.r
SIRsim <- function(a, b, N, T) {
# Simulate an SIR epidemic
# a is infection rate, b is removal rate
# N initial susceptibles, 1 initial infected, simulation length T
# returns a matrix size (T+1)*3 with columns S, I, R respectively
S <- rep(0, T+1)
I <- rep(0, T+1)
R <- rep(0, T+1)
S[1] <- N
I[1] <- 1
R[1] <- 0
for (i in 1:T) {
S[i+1] <- rbinom(1, S[i], (1 - a)^I[i])
R[i+1] <- R[i] + rbinom(1, I[i], b)
I[i+1] <- N + 1 - R[i+1] - S[i+1]
}
return(matrix(c(S, I, R), ncol = 3))
}
The core of the simulation is the for loop. My question, is since the code produces the S[i+1] and R[i+1] values from the S[i] and R[i] values, is it possible to vectorize it with an apply function?
Many thanks
It's hard to 'vectorize' iterative calculations, but this is a simulation and simulations are likely to be run many times. So write this to do all the the simulations at the same time by adding an argument M (number of simulations to perform), allocating an M x (T + 1) matrix, and then filling in successive columns (times) of each simulation. The changes seem to be remarkably straight-forward (so I've probably made a mistake; I'm particularly concerned about the use of vectors in the second and third arguments to rbinom, though this is consistent with the documentation).
SIRsim <- function(a, b, N, T, M) {
## Simulate an SIR epidemic
## a is infection rate, b is removal rate
## N initial susceptibles, 1 initial infected, simulation length T
## M is the number of simulations to run
## returns a list of S, I, R matricies, each M simulation
## across T + 1 time points
S <- I <- R <- matrix(0, M, T + 1)
S[,1] <- N
I[,1] <- 1
for (i in seq_along(T)) {
S[,i+1] <- rbinom(M, S[,i], (1 - a)^I[,i])
R[,i+1] <- R[,i] + rbinom(M, I[,i], b)
I[,i+1] <- N + 1 - R[,i+1] - S[,i+1]
}
list(S=S, I=I, R=R)
}
I have a R code that can do convolution of two functions...
convolveSlow <- function(x, y) {
nx <- length(x); ny <- length(y)
xy <- numeric(nx + ny - 1)
for(i in seq(length = nx)) {
xi <- x[[i]]
for(j in seq(length = ny)) {
ij <- i+j-1
xy[[ij]] <- xy[[ij]] + xi * y[[j]]
}
}
xy
}
Is there a way to remove the two for loops and make the code run faster?
Thank you
San
Since R is very fast at computing vector operations, the most important thing to keep in mind when programming for performance is to vectorise as many of your operations as possible.
This means thinking hard about replacing loops with vector operations. Here is my solution for fast convolution (50 times faster with input vectors of length 1000 each):
convolveFast <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1
xy <- rep(0, xy)
for(i in (1:nx)){
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
}
xy
}
You will notice that the inner loop (for j in ...) has disappeared. Instead, I replaced it with a vector operation. j is now defined as a vector (j <- 1:ny). Notice also that I refer to the entire vector y, rather than subsetting it (i.e. y instead of y[j]).
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
I wrote a small function to measure peformance:
measure.time <- function(fun1, fun2, ...){
ptm <- proc.time()
x1 <- fun1(...)
time1 <- proc.time() - ptm
ptm <- proc.time()
x2 <- fun2(...)
time2 <- proc.time() - ptm
ident <- all(x1==x2)
cat("Function 1\n")
cat(time1)
cat("\n\nFunction 2\n")
cat(time2)
if(ident) cat("\n\nFunctions return identical results")
}
For two vectors of length 1000 each, I get a 98% performance improvement:
x <- runif(1000)
y <- runif(1000)
measure.time(convolveSlow, convolveFast, x, y)
Function 1
7.07 0 7.59 NA NA
Function 2
0.14 0 0.16 NA NA
Functions return identical results
For vectors, you index with [], not [[]], so use xy[ij] etc
Convolution doesn't vectorise easily but one common trick is to switch to compiled code. The Writing R Extensions manual uses convolution as a running example and shows several alternative; we also use it a lot in the Rcpp documentation.
As Dirk says, compiled code can be a lot faster. I had to do this for one of my projects and was surprised at the speedup: ~40x faster than Andrie's solution.
> a <- runif(10000)
> b <- runif(10000)
> system.time(convolveFast(a, b))
user system elapsed
7.814 0.001 7.818
> system.time(convolveC(a, b))
user system elapsed
0.188 0.000 0.188
I made several attempts to speed this up in R before I decided that using C code couldn't be that bad (note: it really wasn't). All of mine were slower than Andrie's, and were variants on adding up the cross-product appropriately. A rudimentary version can be done in just three lines.
convolveNotAsSlow <- function(x, y) {
xyt <- x %*% t(y)
ds <- row(xyt)+col(xyt)-1
tapply(xyt, ds, sum)
}
This version only helps a little.
> a <- runif(1000)
> b <- runif(1000)
> system.time(convolveSlow(a, b))
user system elapsed
6.167 0.000 6.170
> system.time(convolveNotAsSlow(a, b))
user system elapsed
5.800 0.018 5.820
My best version was this:
convolveFaster <- function(x,y) {
foo <- if (length(x)<length(y)) {y %*% t(x)} else { x %*% t(y) }
foo.d <- dim(foo)
bar <- matrix(0, sum(foo.d)-1, foo.d[2])
bar.rc <- row(bar)-col(bar)
bar[bar.rc>=0 & bar.rc<foo.d[1]]<-foo
rowSums(bar)
}
This was quite a bit better, but still not nearly as fast as Andrie's
> system.time(convolveFaster(a, b))
user system elapsed
0.280 0.038 0.319
The convolveFast function can be optimized a little by carefully using integer math only and replacing (1:ny)-1L with seq.int(0L, ny-1L):
convolveFaster <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1L
xy <- rep(0L, xy)
for(i in seq_len(nx)){
j <- seq_len(ny)
ij <- i + j - 1L
xy[i+seq.int(0L, ny-1L)] <- xy[ij] + x[i] * y
}
xy
}
How about convolve(x, rev(y), type = "open") in stats?
> x <- runif(1000)
> y <- runif(1000)
> system.time(a <- convolve(x, rev(y), type = "o"))
user system elapsed
0.032 0.000 0.032
> system.time(b <- convolveSlow(x, y))
user system elapsed
11.417 0.060 11.443
> identical(a,b)
[1] FALSE
> all.equal(a,b)
[1] TRUE
Some say the apply() and sapply() functions are faster than for() loops in R. You could convert the convolution to a function and call it from within apply().
However, there is evidence to the contrary
http://yusung.blogspot.com/2008/04/speed-issue-in-r-computing-apply-vs.html