Allocate a big matrix - r

I'm using the bigmemory package. I want to calculate w. My v length is 478000 and k length is 240500. The two matrix multiplication is w very large.
I run the code by loop, but it still is running and is not finished yet and I don't know if will give me the result or not.
I tried to calculate it without the for loop, but I got and error. Please any help to correct my code to make it fast.
v <-read.big.matrix('v.dat',type='double')
k <-read.big.matrix('k.dat',type='double')
m=length(v);
n=length(k);
for(i in 1:m)
{
for(j in 1:n)
{
w[i,j] = 2 * cos(dt * v[i] * k[j]) - 2
}
}
How I can define w before the loop because the size of w is very large I couldn't do like w <- matrix(nr,ncol).

Preallocating a matrix can be done like this:
m = matrix(rep(0, number_or_rows*number_of_columns),
number_of_rows, number_of_columns))
This creates a matrix with the amount of rows and columns defined in the variables number_of_rows and number_of_columns, filled initially with all 0.
What is probably going to be a problem is that because w is equal in size to v and k, you might very well run into memory issues when filling w. You could solve this by also using a bigmemory matrix for w, or running your analysis in chunks.

You need to use the 'big.matrix"-class constructors, and since you are obviously exceeding RAM resources, it would appear necessary that you define it as a "filebacked.big.matrix"
w <- filebacked.big.matrix( m, n , # additional arguments to allocate files and dims
)
See the last example in:
help(big.matrix, package=bigmemory)

agstudy is on the right track, but you could use outer here, as
w <- outer(v,k,FUN=function(x,y) 2*cos(x*y)-2 )
v<-runif(10)
k<-runif(10)
m=length(v);
n=length(k);
w<-matrix(nr=m,nc=n)
for(i in 1:m)
{
for(j in 1:n)
{
w[i,j] = 2 * cos( v[i] * k[j]) - 2
}
}
ww <- outer(v,k,function(x,y) 2*cos(x*y)-2)
Test: ww-w is a matrix of zeroes.

I would do something like this using R vectorization feature:
for(i in 1:m)
{
w[i] = 2 * cos(dt * v[i] * k) - 2 # I compute n terms here
}

Related

R - How can I make this loop faster?

Is there some way to make this loop faster in r?
V=array(NA, dim=c(nrow(pixDF), n))
for(i in 1:n)
{
sdC<-sqrt(det(Cov[,i,]))
iC<-inv(Cov[,i,])
V[,i]<-apply(pixDF,1,function(x)(sdC*exp(-0.5*((x-Mean[i,])%*%iC%*%as.matrix((x-Mean[i,]))))))
}
where, in this case, pixDF is a matrix with 490000 rows and 4 columns filled with doubles. n = 5. Cov is a (4,5,4) array filled with "doubles". Mean is a (5,4) array filled with doubles as well.
This loop was taking about 30min on my computer. (before editing).
Right now it's taking 1min.
As Ronak notes, it is hard to help without reproducible example. But, I think that apply could be avoided. Something like this COULD work:
V <- array(NA, dim = c(nrow(pixDF), n))
tpixDF <- t(pixDF)
for (i in 1:n) {
x <- Cov[, i, ]
sdC <- sqrt(det(x))
iC <- solve(x)
mi <- Mean[i, ]
k <- t(tpixDF - mi)
V[, i] <- sdC*exp(-0.5*rowSums(k %*% iC * k))
}
Also, as Roland mentions inv probably is equal solve.

How to make an R function that loops over two lists

I have an event A that is triggered when the majority of coin tosses in a series of tosses comes up heads. I have an unfair coin and I'd like to see how the likelihood of A changes as the number of tosses change and the probability in each toss changes.
This is my function assuming 3 tosses
n <- 3
#victory requires majority of tosses heads
#tosses only occur in odd intervals
k <- seq(n/2+.5,n)
victory <- function(n,k,p){
for (i in p) {
x <- 0
for (i in k) {
x <- x + choose(n, k) * p^k * (1-p)^(n-k)
}
z <- x
}
return(z)
}
p <- seq(0,1,.1)
victory(n,k,p)
My hope is the victory() function would:
find the probability of each of the outcomes where the majority of tosses are heads, given a particular value p
sum up those probabilities and add them to a vector z
go back and do the same thing given another probability p
I tested this with n <- 3, k <- c(2,3) and p <- (.5,.75) and the output was 0.75000, 0.84375. I know that the output should've been 0.625, 0.0984375.
I wasn't able to get exactly the result you wanted, but maybe can help you along a bit.
When looping in R the vector you are looping through remains unchanged and value you are using to loop changes. For example see the differences in these loops:
test <- seq(0,1,length.out = 5)
for ( i in test){
print(test)
}
for ( i in test){
print(i)
}
for ( i in 1:length(test)){
print(test[i])
}
when you are iterating you are firstly setting i to the first number in p, then to the first number in k and then using the unchanged vectors.
You are also assigning to z in the first loop of p and then writing over it in the second loop.
Try using the below - I am still not getting the answer you say but it might help you find where the error is (printing out along the way or using debug(victory) might also be helpful
victory <- function(n,k,p){
z <-list()
for (i in 1:length(p)) {
x <- 0
for (j in 1:length(k)) {
x <- x + choose(n, k[j]) * p[i]^k[j] * (1-p[i])^(n-k[j])
}
z[i] <- x
}
return(z)
}

how to modify my R code to accelerate computational speed

Here are my R code. Could you please give me some advice so that can accelerate the computational speed :)
First, the function myfun()generates a complex number.
Second, I compute the elements of matrix M using myfun().
myfun<-function(a,b,nq,ul,uk)
{
m<-seq(1,(nq/2)+1,length=(nq/2)+1);
k<-m;
D<-matrix(NA,nrow = length(k),ncol = length(k));
for(i in 1:length(k)) # row
for(j in 1:length(m)) # column
{
D[i,j]<-(2/nq)*cos(((j-1)*(i-1)*pi)/(nq*0.5))
}
D[,1]<-D[,1]*0.5;
D[,ncol(D)]<-D[,ncol(D)]*0.5;
# compute the vector v
vseq<-seq(2,nq-2,by=2);
vr<-2/(1-vseq^2);
vr<-c(1,vr,1/(1-nq*nq));
v<-matrix(vr,ncol=1); # v is a N by 1 matrix
# compute the vector w, length(w)=nq/2+1
h<-function(x,ul,uk)
{
((b-a)/2)*(exp((b-a)/2*x+(a+b)/2)+1)^(1i*uk)*cos(((b-a)/2*x+(a+b)/2-a)*ul)
}
w<-matrix(rep(NA,length(v)),ncol=1);
for(i in 1:length(w))
{
w[i]<-h((cos((i-1)*pi/nq)),ul,uk)+h((-cos((i-1)*pi/nq)),ul,uk)
}
res<-t(t(D)%*%v)%*%w; # each element of matrix M
return(res)
}
Next, compute each element of matrix M. The N-th column and N-th row are zeros.
matrix.M<-matrix(0,ncol = N,nrow = N);
for(i in 1:N-1)
for(j in 1:N-1)
{
matrix.M[i,j]<-myfun(a,b,nq,i-1,j-1)
}
We can set parameters as
a<--173.2;
b<-78;
alpha<-0.24;
Dt<-0.1;
M<-1000;
N<-150;
u<-seq(1,150,by=1)*pi/(b-a);
nq<-3000;
I appreciate your help!
Here are some suggestions for speeding the function up. I use three "tricks":
Vectorize as many functions as possible
Use the outer function instead of a double loop
Use the hidden gem crossprod for the final matrix products
myfun<-function(a,b,nq,ul,uk) {
m<-seq(1,(nq/2)+1,length=(nq/2)+1);
k<-m;
## Use outer to compute the elements of the matrix
D <- outer(1:length(k), 1:length(m), function(i, j) {(2/nq)*cos(((j-1)*(i-1)*pi)/(nq*0.5))} )
D[,1]<-D[,1]*0.5;
D[,ncol(D)]<-D[,ncol(D)]*0.5;
# compute the vector v
vseq<-seq(2,nq-2,by=2);
vr<-2/(1-vseq^2);
vr<-c(1,vr,1/(1-nq*nq));
v<-matrix(vr,ncol=1); # v is a N by 1 matrix
h<-function(x,ul,uk) {
((b-a)/2)*(exp((b-a)/2*x+(a+b)/2)+1)^(1i*uk)*cos(((b-a)/2*x+(a+b)/2-a)*ul)
}
## Compute the full w vector in one go
vect <- seq_along(v)-1
w <- h((cos(vect*pi/nq)),ul,uk) + h((-cos(vect*pi/nq)),ul,uk)
## Compute the cross products.
res <- crossprod(crossprod(D, v), w)
return(res)
}
I think this should save around 80% of the time compared to the original function. The time hog was the initial computation of D. Hope this helps.

Nested rolling sum in vector

I am struggling to produce an efficient code to compute the vector result r result from an input vector v using this function.
r(i) = \sum_{j=i}^{i-N} [o(i)-o(j)] * exp(o(i)-o(j))
where i loops (from N to M) over the vector v. Size of v is M>>N.
Of course this is feasible with 2 nested for loops, but it is too slow for computational purposes, probably out of fashion and deprecated style...
A MWE:
for (i in c(N+1):length(v)){
csum <- 0
for (j in i:c(i-N)) {
csum <- csum + (v[i]-v[j])*exp(v[i]-v[j])
}
r[i] <- csum
}
In my real application M > 10^5 and the v vector is indeed several vectors.
I have been trying with nested applications of lapply and rollapply without success.
Any suggestion is welcome.
Thanks!
I don't know if it is any more efficient but something you can try:
r[N:M] <- sapply(N:M, function(i) tail(cumsum((v[i]-v[1:N])*exp(v[i]-v[1:N])), 1))
checking that both computations give same results, I got r with your way and r2 with mine, initializing r2 to rep(NA, M) and assessed the similarity:
all((r-r2)<1e-12, na.rm=TRUE)
# [1] TRUE
NOTE: as in #lmo answer, tail(cumsum(...), 1) can be efficiently replaced by just using sum(...):
r[N:M] <- sapply(N:M, function(i) sum((v[i]-v[1:N])*exp(v[i]-v[1:N])))
Here is a method with a single for loop.
# create new blank vector
rr <- rep(NA,M)
for(i in N:length(v)) {
rr[i] <- sum((v[i] - v[seq_len(N)]) * exp(v[i] - v[seq_len(N)]))
}
check for equality
all.equal(r, rr)
[1] TRUE
You could reduce the number of operations by 1 if you store the difference. This should add a little speed up.
for(i in N:length(v)) {
x <- v[i] - v[seq_len(N)]
rr[i] <- sum(x * exp(x))
}

Non-comformable arguments in R

I am re-writting an algorithm I did in C++ in R for practice called the Finite Difference Method. I am pretty new with R so I don't know all the rules regarding vector/matrix multiplication. For some reason I am getting a non-conformable arguments error when I do this:
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
I get this error:
Error in sqrt(T) %*% Z : non-conformable arguments
Here is my whole code:
gaussian_box_muller <- function(n){
theta <- runif(n, 0, 2 * pi)
rsq <- rexp(n, 0.5)
x <- sqrt(rsq) * cos(theta)
return(x)
}
d_j <- function(j, S, K, r, v,T) {
return ((log(S/K) + (r + (-1^(j-1))*0.5*v*v)*T)/(v*(T^0.5)))
}
call_delta <- function(S,K,r,v,T){
return (S * dnorm(d_j(1, S, K, r, v, T))-K*exp(-r*T) * dnorm(d_j(2, S, K, r, v, T)))
}
Finite_Difference <- function(S0,K,r,sigma,T,M,delta_S){
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
Delta <- matrix(0,M,1)
totDelta <- 0
for(i in 1:M){
if(ST_u[i] - K > 0 && ST_l[i] - K > 0){
Delta[i] <- ((ST_u[i] - K) - (ST_l[i] - K))/(2*delta_S)
}else{
Delta <- 0
}
totDelta = totDelta + exp(-r*T)*Delta[i]
}
totDelta <- totDelta * 1/M
Var <- 0
for(i in 1:M){
Var = Var + (Delta[i] - totDelta)^2
}
Var = Var*1/M
cat("The Finite Difference Delta is : ", totDelta)
call_Delta_a <- call_delta(S,K,r,sigma,T)
bias <- abs(call_Delta_a - totDelta)
cat("The bias is: ", bias)
cat("The Variance of the Finite Difference method is: ", Var)
MSE <- bias*bias + Var
cat("The marginal squared error is thus: ", MSE)
}
S0 <- 100.0
delta_S <- 0.001
K <- 100.0
r <- 0.05
sigma <- 0.2
T <- 1.0
M <- 10
result1 <- Finite_Difference(S0,K,r,sigma,T,M,delta_S)
I can't seem to figure out the problem, any suggestions would be greatly appreciated.
In R, the %*% operator is reserved for multiplying two conformable matrices. As one special case, you can also use it to multiply a vector by a matrix (or vice versa), if the vector can be treated as a row or column vector that conforms to the matrix; as a second special case, it can be used to multiply two vectors to calculate their inner product.
However, one thing it cannot do is perform scalar multipliciation. Scalar multiplication of vectors or matrices always uses the plain * operator. Specifically, in the expression sqrt(T) %*% Z, the first term sqrt(T) is a scalar, and the second Z is a matrix. If what you intend to do here is multiply the matrix Z by the scalar sqrt(T), then this should just be written sqrt(T) * Z.
When I made this change, your program still didn't work because of another bug -- S is used but never defined -- but I don't understand your algorithm well enough to attempt a fix.
A few other comments on the program not directly related to your original question:
The first loop in Finite_Difference looks suspicious: guassian_box_muller(i) generates a vector of length i as i varies in the loop from 1 up to M, and forcing these vectors into a column matrix of length M to generate Z is probably not doing what you want. It will "reuse" the values in a cycle to populate the matrix. Try these to see what I mean:
matrix(gaussian_box_muller(1),10,1) # all one value
matrix(gaussian_box_muller(3),10,1) # cycle of three values
You also use loops in many places where R's vector operations would be easier to read and (typically) faster to execute. For example, your definition of Var is equivalent to:
Var <- sum((Delta - totDelta)^2)/M
and the definitions of Delta and totDelta could also be written in this simplified fashion.
I'd suggest Googling for "vector and matrix operations in r" or something similar and reading some tutorials. Vector arithmetic in particular is idiomatic R, and you'll want to learn it early and use it often.
You might find it helpful to consider the rnorm function to generate random Gaussians.
Happy R-ing!

Resources