matrix multiplication in R (incredibly slow) - r

I have the following piece of code:
Y.hat.tr <- array(0,c(nXtr,2))
for (i in 1:nXtr){
#print(i)
Y.hat.tr[i,2] <- ktr[,i]%*%solve(K + a*In)%*%Ytr
#Y.hat.tr[i,2] <- ktr[,i]%*%chol2inv(chol((K + a*In)))%*%Ytr
}
Y.hat.tr[,1] <- Ytr
My problem is that nXtr =300, and ktr is a 300x300 matrix. This routine takes approx 30 seconds to run in R version 3.0.1. I have tried various approaches to reduce the run time, but to no avail.
Any ideas would be gratefully received. If any other information is required please let me know
I have now taken the solve(K + a*In)%*%Ytr out of the loop, which has helped, but I was hoping to somehow vectorise this piece of code. Having thought about this for a while, and also after looking through various posts I cannot see how this can be done?

Maybe I am missing something (and without sample or simulated data to test on it is harder to check), but isn't your loop equivalent to:
Y.hat.tr[,2] <- t(ktr) %*% solve(K + a*In) %*% Ytr
?
Removing the loop altogether and using internal vectorized code may speed things up.
Also, you are using solve with 1 argument, often you can speed things by using solve with 2 arguments (fewer internal calculations), something like:
t(ktr) %*% solve( K + a*In, Ytr )
Your loop is of the type called embarrassingly parallel, which means that if you want to keep the loop and are working on a computer with more than 1 core (or have easy access to a cluster) then you could use the parallel package (and maybe simplest to convert using the foreach package) to run the calculations in parallel which sometimes can greatly speed up the process.

Related

More efficient way to compute the rowNorms in R?

I wrote a program using an unsupervised K-means algorithm to try and compress images. It now works but in comparison to Python it's incredibly slow! Specifically it's finding the rowNorms thats slow. The array X is 350000+ elements.
This is the particular function:
find_closest_centroids <- function(X, centroids) {
m <- nrow(X)
c <- integer(m)
for(i in 1:m){
distances = rowNorms(sweep(centroids,2,X[i,]))
c[i] = which.min(distances)
}
return(c)
}
In Python I am able to do it like this:
def find_closest_centroids(X, centroids):
m = len(X)
c = np.zeros(m)
for i in range(m):
distances = np.linalg.norm(X[i] - centroids, axis=1)
c[i] = np.argmin(distances)
return c
Any recommendations?
Thanks.
As dvd280 has noted in his comment, R tends to do worse than many other languages in terms of performance. If are content with the performance of your code in Python, but need the function available in R, you might want to look into the reticulate package which provides an interface to python like the Rcpp package mentioned by dvd280 does for C++.
If you still want to implement this natively in R, be mindful of the data structures you use. For rowwise operations, data frames are a poor choice as they are lists of columns. I'm not sure about the data structures in your code, but rowNorms() seems to be a matrix method. You might get more mileage out of a list of rows structure.
If you feel like getting into dplyr, you could find this vignette on row-wise operations helpful. Make sure you have the latest version of the package, as the vignette is based on dplyr 1.0.
The data.table package tends to yield the best performance for large data sets in R, but I'm not familiar with it, so I can't give you any further directions on that.

foreach very slow with large number of values

I'm trying to use foreach to do parallel computations. It works fine if there are a small number of values to iterate over, but at some point it becomes incredibly slow. Here's a simple example:
library(foreach)
library(doParallel)
registerDoParallel(8)
out1 <- foreach(idx=1:1e6) %do%
{
1+1
}
out2 <- foreach(idx=1:1e6) %dopar%
{
1+1
}
out3 <- mclapply(1:1e6,
function(x) 1+1,
mc.cores=20)
out1 and out2 take an incredibly long time to run. Neither of them even spawns multiple threads for as long as I keep them running. out3 spawns the threads almost immediately and runs very quickly. Is foreach doing some sort of initial processing that doesn't scale well? If so, is there is a simple fix? I really prefer the syntax of foreach.
I should also note that the actual code that I'm trying to parallelize is substantially more complicated than 1+1. I only show this as an example because even with this simple code foreach seems to be doing some pre-processing that is incredibly slow.
the forach/doParallel vignette says (to a code much smaller than yours):
Note well that this is not a practical use of doParallel. This is our
“Hello, world” program for parallel computing. It tests that
everything is installed and set up properly, but don’t expect it to
run faster than a sequential for loop, because it won’t! sqrt executes
far too quickly to be worth executing in parallel, even with a large
number of iterations. With small tasks, the overhead of scheduling the
task and returning the result can be greater than the time to execute
the task itself, resulting in poor performance. In addition, this
example doesn’t make use of the vector capabilities of sqrt, which it
must to get decent performance. This is just a test and a pedagogical
example, not a benchmark.
So it might be in the nature of your setting that it is not faster.
Instead try without parallelization but using vectorization:
q <- sapply(1:1e6, function(x) 1 + 1 )
It does exactly the same like your example loops and is done in a second.
And now try this (it does still exactly the same thing exaclty the same times:
x <- rep(1, n=1e6)
r <- x + 1
It adds to 1e6 1s a 1 instantly. (The power of vectorization ...)
The combination of foreach with doParallel is from my personal experience much slower than if you use the bioinformatics BiocParallel package from the repository Bioconda. (I am a bioinformatician and in bioinformatics, we have very often calculation-heavy stuff, since we have single data files of several gigabytes to process - and many of them).
I tried your function using BiocParallel and it uses all assigned CPUs by 100% (tested by running htop during job execution) the entire thing took 17 seconds.
For sure - with your lightweight example, this applies:
the overhead of scheduling the task and returning the result
can be greater than the time to execute the task itself
Anyway, it seems to use the CPUs more thoroughly than doParallel. So use this, if you have calculation-heavy tasks to be get done.
Here the code how I did it:
# For bioconductor packages, the best is to install this:
install.packages("BiocManager")
# Then activate the installer
require(BiocManager)
# Now, with the `install()` function in this package, you can install
# conveniently Bioconductor packages like `BiocParallel`
install("BiocParallel")
# then, activate it
require(BiocParallel)
# initiate cores:
bpparam <- bpparam <- SnowParam(workers=4, type="SOCK") # 4 or take more CPUs
# prepare the function you want to parallelize
FUN <- function(x) { 1 + 1 }
# and now you can call the function using `bplapply()`
# the loop parallelizing function in BiocParallel.
s <- bplapply(1:1e6, FUN, BPPARAM=bpparam) # each value of 1:1e6 is given to
# FUN, note you have to pass the SOCK cluster (bpparam) for the
# parallelization
For more info, go to the vignette of the BiocParallel package.
Look at bioconductor how many packages it provides and all well documented.
I hope this helps you for your future parallel computing stuff.

How to speed up the generation of a latin hypercube (LHS) design

I'm trying to generate an optimized LHS (Latin Hypercube Sampling) design in R, with sample size N = 400 and d = 7 variables, but it's taking forever. My pc is an HP Z820 workstation with 12 cores, 32 Mb RAM, Windows 7 64 bit, and I'm running Microsoft R Open which is a multicore version of R. The code has been running for half an hour, but I still don't see any results:
library(lhs)
lhs_design <- optimumLHS(n = 400, k = 7, verbose = TRUE)
It seems a bit weird. Is there anything I could do to speed it up? I heard that parallel computing may help with R, but I don't know how to use it, and I have no idea if it speeds up only code that I write myself, or if it could speed up an existing package function such as optimumLHS. I don't have to use the lhs package necessarily - my only requirement is that I would like to generate an LHS design which is optimized in terms of S-optimality criterion, maximin metric, or some other similar optimality criterion (thus, not just a vanilla LHS). If worse comes to worst, I could even accept a solution in a different environment than R, but it must be either MATLAB or a open source environment.
Just a little code to check performance.
library(lhs)
library(ggplot2)
performance<-c()
for(i in 1:100){
ptm<-proc.time()
invisible(optimumLHS(n = i, k = 7, verbose = FALSE))
time<-print(proc.time()-ptm)[[3]]
performance<-rbind(performance,data.frame(time=time, n=i))
}
ggplot(performance,aes(x=n,y=time))+
geom_point()
Not looking too good. It seems to me you might be in for a very long wait indeed. Based on the algorithm, I don't think there is a way to speed things up via parallel processing, since to optimize the separation between sample points, you need to know the location of the all the sample points. I think your only option for speeding this up will be to take a smaller sample or get (access)a faster computer. It strikes me that since this is something that only really has to be done once, is there a resource where you could just get a properly sampled and optimized distribution already computed?
So it looks like ~650 hours for my machine, which is very comparable to yours, to compute with n=400.

Efficient programming to overcome memory limit in R

I have a function that calculates an index in R for a matrix of binary data. The goal of this function is to calculate a person-fit index for binary response data called HT. It divides the covariance between response vectors of two respondents (e.g. person i & j) by the maximum possible covariance between the two response patterns which can be calculated using the mean of response vectors(e.g. Bi).The function is:
fit<-function(Data){
N<-dim(Data)[1]
L<-dim(Data)[2]
r <- rowSums(Data)
p.cor.n <- (r/L) #proportion correct for each response pattern
sig.ij <- var(t(Data),t(Data)) #covariance of response patterns
diag(sig.ij) <-0
H.num <- apply(sig.ij,1,sum)
H.denom1 <- matrix(p.cor.n,N,1) %*% matrix(1-p.cor.n,1,N) #Bi(1-Bj)
H.denom2 <- matrix(1-p.cor.n,N,1) %*% matrix(p.cor.n,1,N) #(1-Bi)Bj
H.denomm <- ifelse(H.denom1>H.denom2,H.denom2,H.denom1)
diag(H.denomm) <-0
H.denom <- apply(H.denomm,1,sum)
HT <- H.num / H.denom
return(HT)
}
This function works fine with small matrices (e.g. 1000 by 20) but when I increased the number of rows (e.g. to 10000) I came across to memory limitation problem. The source of the problem is this line in the function:
H.denomm <- ifelse(H.denom1>H.denom2,H.denom2,H.denom1)
which selects the denominator for each response pattern.Is there any other way to re-write this line which demands lower memory?
P.S.: you can try data<-matrix(rbinom(200000,1,.7),10000,20).
Thanks.
Well here is one way you could shave a little time off. Overall I still think there might be a better theoretical answer in terms of the approach you take....But here goes. I wrote up an Rcpp function that specifically implements ifelse in the sense you use it in above. It only works for square matrices like in your example. BTW I wasn't really trying to optimize R ifelse because I'm pretty sure it already calls internal C functions. I was just curious if a C++ function designed to do exactly what you are trying to do and nothing more would be faster. I shaved 11 seconds off. (This selects the larger value).
C++ Function:
library(Rcpp)
library(inline)
code <-"
Rcpp::NumericMatrix x(xs);
Rcpp::NumericMatrix y(ys);
Rcpp::NumericMatrix ans (x.nrow(), y.ncol());
int ii, jj;
for (ii=0; ii &lt x.nrow(); ii++){
for (jj=0; jj &lt x.ncol(); jj++){
if(x(ii,jj) &lt y(ii,jj)){
ans(ii,jj) = y(ii,jj);
} else {
ans(ii,jj) = x(ii,jj);
}
}
}
return(ans);"
matIfelse <- cxxfunction(signature(xs="numeric",ys="numeric"),
plugin="Rcpp",
body=code)
Now if you replace ifelse in your function above with matIfelse you can give it a try. For example:
H.denomm <- matIfelse(H.denom1,H.denom2)
# Time for old version to run with the matrix you suggested above matrix(rbinom(200000,1,.7),10000,20)
# user system elapsed
# 37.78 3.36 41.30
# Time to run with dedicated Rcpp function
# user system elapsed
# 28.25 0.96 30.22
Not bad roughly 36% faster, again though I don't claim that this is generally faster than ifelse just in this very specific instance. Cheers
P.s. I forgot to mention that to use Rcpp you need to have Rtools installed and during the install make sure environment path variables are added for Rtools and gcc. On my machine those would look like: c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin
Edit:
I just noticed that you were running into memory problems... So I'm not sure if you are running a 32 or 64 bit machine, but you probably just need to allow R to increase the amount of RAM it can use. I'll assume you are running on 32 bit to be safe. So you should be able to let R take at least 2gigs of RAM. Give this a try: memory.limit(size=1900) size is in megabytes so I just went for 1.9 gigs just to be safe. I'd imagine this is plenty of memory for what you need.
Do you actually intend to do NxL independent ifelse((H.denom1>H.denom2,... operations?
H.denomm <- ifelse(H.denom1>H.denom2,H.denom2,H.denom1)
If you really do, look for a library or alternatively, a better decomposition.
If you told us in general terms what this code is trying to do, it would help us answer it.

Sliding FFT in R

Is there a function or package in R for calculating the Sliding FFT of a sample? By this I mean that given the output of fft(x[n:m]), calculate fft(x[1+(n:m)]) efficiently.
Ideally I'd find both an online version (where I don't have access to the full time series at the beginning, or it's too big to fit in memory, and I'm not going to try to save the whole running FFT in memory either) and a batch version (where I give it the whole sample x and tell it the running window width w, resulting in a complex matrix of dimension c(w,length(x)/w)).
An example of such an algorithm is presented here (but I've never tried implementing it in any language yet):
http://cnx.org/content/m12029/latest/
If no such thingy exists already in R, that doesn't look too hard to implement I guess.
As usually happens when I post something here, I kept working on it and came up with a solution:
fft.up <- function(x1, xn, prev) {
b <- length(prev)
vec <- exp(2i*pi*seq.int(0,b-1)/b)
(prev - x1 + xn) * vec
}
# Test it out
x <- runif(6)
all.equal(fft.up(x[1], x[6], fft(x[1:5])), fft(x[2:6]))
# [1] TRUE
Still interested to know if some library offers this, because then it might offer other handy things too. =) But for now my problem's solved.

Resources