Related
I have two for loops in R with a data around 150000 observation. I tried apply() family functions but they were slower than for loop in my case. here is my code:
where k=500 and N= 150000, x is location at each time t (for all observation) and xm is specific x with a specific coordination that I filtered here. At each time j we observe xm so we remove it from the data and fit the model with the rest of dataset. I had an if else condition here that removed it in order to make the loop faster.
It's so slow, I am so thankful for your help!
xs = 0:200
result= matrix(0, k,N )
for (j in 1: N){
for ( i in 1:k){
a <- sum(dnorm(xs[i],xm[-j],bx))
b <- sum(dnorm(xs[i],x[-ind[j]],bx))
result[i,j]<-a/b
}
}
Using dummy values ind, x, and xm, here is a solution that runs in about 10 seconds on my machine (>1000 times faster than the original code).
# start with a small N for verification
N <- 15e2L
xm <- runif(N)
x <- runif(N)
ind <- sample(N)
k <- 501L
xs <- 0:500
bx <- 2
system.time({
# proposed solution
a <- outer(xs, xm, function(x, y) dnorm(x, y, bx))
b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx))
result1 <- (rowSums(a) - a)/(rowSums(b) - b)
})
#> user system elapsed
#> 0.08 0.02 0.10
system.time({
# OP's solution
result2 <- matrix(0, k, N)
for (j in 1:N){
for (i in 1:k){
a <- sum(dnorm(xs[i], xm[-j], bx))
b <- sum(dnorm(xs[i], x[-ind[j]], bx))
result2[i,j] <- a/b
}
}
})
#> user system elapsed
#> 109.42 0.80 110.90
# check that the results are the same
all.equal(result1, result2)
#> [1] TRUE
# use a large N
N <- 15e4L
xm <- runif(N)
x <- runif(N)
ind <- sample(N)
system.time({
a <- outer(xs, xm, function(x, y) dnorm(x, y, bx))
b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx))
result1 <- (rowSums(a) - a)/(rowSums(b) - b)
})
#> user system elapsed
#> 8.62 1.10 9.73
I'm a beginner with R and programming in general and i'm having some problems with this loop.
Basically i want to generate 10,000 estimates of beta_2 when n=10 and store them in a vector where the estimator in question is given by the formula (cov(x,y)/var(x)).
Ive tried the following code but it only yields the first estimate correctly and fills the other positions in the vector as NA. Any tips to solve this?
X <- rlnorm(n, X_meanlog, X_sdlog)
u <- rnorm(n, u_mean, u_sd)
Y <- beta_1 + beta_2 * X + u
rep <- 10000
vect <- vector(mode="numeric", length=rep)
for(i in 1:rep){vect[i] <-(cov(X,Y) / var(X))[i]}
You must simulate the vectors X and Y inside the loop.
n <- 10
X_meanlog <- 0
X_sdlog <- 1
u_mean <- 0
u_sd <- 1
beta_1 <- 2
beta_2 <- 3
set.seed(5276) # Make the results reproducible
rept <- 10000
vect <- vector(mode="numeric", length=rept)
for(i in 1:rept){
X <- rlnorm(n, X_meanlog, X_sdlog)
u <- rnorm(n, u_mean, u_sd)
Y <- beta_1 + beta_2 * X + u
vect[i] <- (cov(X, Y) / var(X))
}
mean(vect)
#[1] 3.002527
You can also run the following simpler simulation.
set.seed(5276) # Make the results reproducible
X <- replicate(rept, rlnorm(n, X_meanlog, X_sdlog))
u <- replicate(rept, rnorm(n, u_mean, u_sd))
Y <- beta_1 + beta_2 * X + u
vect2 <- sapply(seq_len(rept), function(i)
cov(X[, i], Y[, i]) / var(X[, i])
)
mean(vect2)
#[1] 3.001131
I want set up a model for interest rate in binomial tree. The interest rate is path dependent. I want return interest rate (discount factor and payoff) at every step in all scenarios(2^N). The reason I want to return every single interest rate is that I want use the interest rate is compute discount factor. I know how to do this in a complex way. Here I want to use a double loop (or something simpler) to get the results.
w is for "0" or "1" dummy variable matrix representing all scenarios.
r is interest rate. if there is a head(1), then r1=r0+u=r0+0.005; if there is a tail(0), then r1=r0-d.
D is discount factor. D1=1/(1+r0), D2=D1/(1+r1)...
P is payoff.
In this case, period N is 10. therefore, I can compute step by step. However,if N gets larger and larger, I cannot use my method. I want a simple way to compute this. Thank you.
#Real Price
N <- 10
r0 <- 0.06
K <- 0.05
u <- 0.005
d <- 0.004
q <- 0.5
w <- expand.grid(rep(list(0:1),N))
r <- D <- P <- matrix(0,0,nrow=2^N,ncol=N)
for(i in 1:dim(w)[1])
{
r[i,1] <- r0 + u*w[i,1] - d*(1-w[i,1])
r[i,2] <- r[i,1] + u*w[i,2] - d*(1-w[i,2])
r[i,3] <- r[i,2]+u*w[i,3]-d*(1-w[i,3])
r[i,4] <- r[i,3]+u*w[i,4]-d*(1-w[i,4])
r[i,5] <- r[i,4]+u*w[i,5]-d*(1-w[i,5])
r[i,6] <- r[i,5]+u*w[i,6]-d*(1-w[i,6])
r[i,7] <- r[i,6]+u*w[i,7]-d*(1-w[i,7])
r[i,8] <- r[i,7]+u*w[i,8]-d*(1-w[i,8])
r[i,9] <- r[i,8]+u*w[i,9]-d*(1-w[i,9])
r[i,10] <- r[i,9]*+u*w[i,10]-d*(1-w[i,10])
D[i,1] <- 1/(1+r0)
D[i,2] <- D[i,1]/(1+r[i,1])
D[i,3] <- D[i,2]/(1+r[i,2])
D[i,4] <- D[i,3]/(1+r[i,3])
D[i,5] <- D[i,4]/(1+r[i,4])
D[i,6] <- D[i,5]/(1+r[i,5])
D[i,7] <- D[i,6]/(1+r[i,6])
D[i,8] <- D[i,7]/(1+r[i,7])
D[i,9] <- D[i,8]/(1+r[i,8])
D[i,10] <- D[i,9]/(1+r[i,9])
P[i,1] <- D[i,1]*pmax(K-r0,0)*(0.5^N)
P[i,2] <- D[i,2]*pmax(K-r[i,1],0)*(0.5^N)
P[i,3] <- D[i,3]*pmax(K-r[i,2],0)*(0.5^N)
P[i,4] <- D[i,4]*pmax(K-r[i,3],0)*(0.5^N)
P[i,5] <- D[i,5]*pmax(K-r[i,4],0)*(0.5^N)
P[i,6] <- D[i,6]*pmax(K-r[i,5],0)*(0.5^N)
P[i,7] <- D[i,7]*pmax(K-r[i,6],0)*(0.5^N)
P[i,8] <- D[i,8]*pmax(K-r[i,7],0)*(0.5^N)
P[i,9] <- D[i,9]*pmax(K-r[i,8],0)*(0.5^N)
P[i,10] <- D[i,10]*pmax(K-r[i,9],0)*(0.5^N)
}
true.price <- sum(P)
#> true.price
# > true.price
# [1] 0.00292045
You can just use a nested loop, looping over 2:(ncol(w)) within the i loop:
for(i in 1:nrow(w)) {
r[i, 1] <- r0 + u*w[i, 1] - d*(1-w[i, 1])
D[i, 1] <- 1/(1+r0)
P[i, 1] <- D[i, 1]*pmax(K-r0, 0)*(0.5^N)
for (j in 2:(ncol(w))) {
r[i,j] <- r[i, j-1] + u*w[i, j] - d*(1-w[i, j])
D[i,j] <- D[i, j-1]/(1+r[i, j-1])
P[i,j] <- D[i, j]*pmax(K-r[i, j-1], 0)*(0.5^N)
}
}
true.price <- sum(P)
Suppose I have a 10 x 10 matrix. I want to randomly choose 2 numbers from each column and take the square of the difference of these numbers. I wrote the R code for that and I get 10 values, but I wish to repeat this, say, 100 times, in which case I need to get 100*10=1000 numbers. How could I do that?
x <- rnorm(100)
m <- 10
n <- 10
X <- matrix(x,m,n)
for (i in 1:m ) {
y <- sample(X[,i],2,rep=F)
q2[i] <- (y[1]-y[2])^2
}
Or as #Davide Passaretti and #nrussell mentioned in the comments, you can use replicate
f1 <- function(x, m){
q2 <- vector(mode='numeric', length= m)
for(i in 1:m){
y <- sample(x[,i], 2, rep=FALSE)
q2[i] <- (y[1]-y[2])^2
}
q2
}
n <- 100
res <- replicate(100, f1(X, m))
prod(dim(res))
#[1] 1000
basically i want to perform diagonal averaging in R. Below is some code adapted from the simsalabim package to do the diagonal averaging. Only this is slow.
Any suggestions for vectorizing this instead of using sapply?
reconSSA <- function(S,v,group=1){
### S : matrix
### v : vector
N <- length(v)
L <- nrow(S)
K <- N-L+1
XX <- matrix(0,nrow=L,ncol=K)
IND <- row(XX)+col(XX)-1
XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
XX <- S[,group] %*% t(t(XX) %*% S[,group])
##Diagonal Averaging
.intFun <- function(i,x,ind) mean(x[ind==i])
RC <- sapply(1:N,.intFun,x=XX,ind=IND)
return(RC)
}
For data you could use the following
data(AirPassengers)
v <- AirPassengers
L <- 30
T <- length(v)
K <- T-L+1
x.b <- matrix(nrow=L,ncol=K)
x.b <- matrix(v[row(x.b)+col(x.b)-1],nrow=L,ncol=K)
S <- eigen(x.b %*% t(x.b))[["vectors"]]
out <- reconSSA(S, v, 1:10)
You can speed up the computation by almost 10 times with the help of a very specialized trick with rowsum:
reconSSA_1 <- function(S,v,group=1){
### S : matrix
### v : vector
N <- length(v)
L <- nrow(S)
K <- N-L+1
XX <- matrix(0,nrow=L,ncol=K)
IND <- row(XX)+col(XX)-1
XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
XX <- S[,group] %*% t(t(XX) %*% S[,group])
##Diagonal Averaging
SUMS <- rowsum.default(c(XX), c(IND))
counts <- if(L <= K) c(1:L, rep(L, K-L-1), L:1)
else c(1:K, rep(K, L-K-1), K:1)
c(SUMS/counts)
}
all.equal(reconSSA(S, v, 1:10), reconSSA_1(S, v, 1:10))
[1] TRUE
library(rbenchmark)
benchmark(SSA = reconSSA(S, v, 1:10),
SSA_1 = reconSSA_1(S, v, 1:10),
columns = c( "test", "elapsed", "relative"),
order = "relative")
test elapsed relative
2 SSA_1 0.23 1.0000
1 SSA 2.08 9.0435
[Update: As Joshua suggested it could be speed up even further by using the crux of the rowsum code:
reconSSA_2 <- function(S,v,group=1){
### S : matrix
### v : vector
N <- length(v)
L <- nrow(S)
K <- N-L+1
XX <- matrix(0,nrow=L,ncol=K)
IND <- c(row(XX)+col(XX)-1L)
XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
XX <- c(S[,group] %*% t(t(XX) %*% S[,group]))
##Diagonal Averaging
SUMS <- .Call("Rrowsum_matrix", XX, 1L, IND, 1:N,
TRUE, PACKAGE = "base")
counts <- if(L <= K) c(1:L, rep(L, K-L-1), L:1)
else c(1:K, rep(K, L-K-1), K:1)
c(SUMS/counts)
}
test elapsed relative
3 SSA_2 0.156 1.000000
2 SSA_1 0.559 3.583333
1 SSA 5.389 34.544872
A speedup of x34.5 comparing to original code!!
]
I can't get your example to produce sensible results. I think there are several errors in your function.
XX is used in sapply, but is not defined in the function
sapply works over 1:N, where N=144 in your example, but x.b only has 115 columns
reconSSA simply returns x
Regardless, I think you want:
data(AirPassengers)
x <- AirPassengers
rowMeans(embed(x,30))
UPDATE: I've re-worked and profiled the function. Most of the time is spent in mean, so it may be hard to get this much faster using R code. That said, you can 20% speedup by using sum instead.
reconSSA <- function(S,v,group=1){
N <- length(v)
L <- nrow(S)
K <- N-L+1
XX <- matrix(0,nrow=L,ncol=K)
IND <- row(XX)+col(XX)-1
XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
XX <- S[,group] %*% t(t(XX) %*% S[,group])
##Diagonal Averaging
.intFun <- function(i,x,ind) {
I <- ind==i
sum(x[I])/sum(I)
}
RC <- sapply(1:N,.intFun,x=XX,ind=IND)
return(RC)
}