I would like to get some overview of what the options are for model comparison in brms when the models are large (brmsfit objects of ~ 6 GB due to 2000000 iterations).
My immediate problem is that add_criterion() won't run after models are finished on my laptop (16GB memory). I got the error message "vector memory exhausted (limit reached?)"; after which I increased the memory cap on R in Renviron to 100GB (as described here: R on MacOS Error: vector memory exhausted (limit reached?)). The total memory usage goes up to about 90 GB; I get error messages in R when I want to estimate both 'waic' and 'loo', if I just estimate 'loo', R invariably crashes.
What are my options here and what would be the recommendations?
Use the cluster - local convention is to use a single node, is this recommendable? (I guess not, as we have 6, 10, and 16GB cores. Any (link to) advice on parallelising R on a cluster is welcome.)
Is it possible to have a less dense posterior in brms, i.e. sample less during estimation, as in BayesTraits?
Can I parallelise R/RStudio on my own laptop?
...?
Many thanks for your advice!
Related
Common sense indicates that any computation should be faster the more cores or threads we use. If the scaling is bad, the computation time will not improve with increasing number of threads. Thus, how come increasing threads considerably reduces the computation time when fitting a gam with R package MGCV, as shown by this example? :
library(boot) # loads data "amis"
t1<-Sys.time()
mod <- gam(speed ~ s(period, warning, pair, k = 12), data = amis, family=tw (link = log),method="REML",control=list(nthreads=1)) #
t2<-Sys.time()
print("Model fitted in:")
print(t2-t1)
If you increase the number of threads in this example to 2, 4, etc, the fitting procedure will take longer and longer, instead of being faster as we would expect. In my particular case:
1 thread: 32.85333 secs
2 threads: 50.63166 secs
3 threads: 1.2635 mins
Why is this? If I am doing something wrong, what can I do to obtain the desired behavior (i.e., increasing performance with increasing number of threads)?
Some notes:
1) The model, family and solving method shown here make no particular sense. This is only an example. However, I’ve got into this problem with real data and a reasonable model (but for simplicity I use this small code to exemplify the problem). Data, functional form of model, family, solving method seem all to be irrelevant: after many tests I get always the same behaviour, i.e., increasing the number of used threads, decreases performance (i.e., increases computation time).
2) Operative System: Linux Ubuntu 18.04;
3) Architecture: DELL Power Edge with two physical CPUs Intel Xeon X5660 each of them with 6 cores #2800 Mhz and each core being able of handling 2 threads (i.e., total of 24 threads). 80Gb RAM.
4) OpenMP libraries (which are needed for the multi-threath capacity of function gam) were installed with
sudo apt-get install libomp-dev
5) I am aware of the help page for multi-core use of gam (https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/mgcv-parallel.html). The only thing written there pointing to a decrease of performance with increasing number of threads is "Because the computational burden in mgcv is all in the linear algebra, then parallel computation may provide reduced (...) benefit with a tuned BLAS".
Apologies if this question is too broad..
I'm running a large data set (around 20Gb on a 64Gb 4 core Linux machine) through cv.xgb in R. I'm currently hitting two issues:
Trying 10-fold cv crashes R (no error from xgboost, session just terminates).
Trying 5-fold, the code will run but reserves 100Gb of virtual memory, and slows to a crawl.
I'm confused as to why the code can do 5-fold but not 10-fold, I would have thought each fold would be treated seperately and would just take twice as long. What is xgboost doing across all folds?
With swap issues, is there any way to better manage the memory to avoid slowdown? the 5-fold cv is taking >10 times as long as a single run on a similar number of trees.
are there any packages better adapted to large data sets? or do I just need more RAM?
I'm trying to run parallel cv.glmnet poisson models on a windows machine with 64Gb of RAM. My data is a 20 million row x 200 col sparse matrix, around 10Gb in size. I'm using makecluster and doParallel, and setting parallel = TRUE in cv.glmnet. I currently have two issues getting this setup:
Distributing data to different processes is taking hours, reducing speedup significantly. I know this can be solved using fork on linux machines, but is there any way of reducing this time on windows?
I'm running this for multiple models with data and responses, so the object size is changing each time. How can I work out in advance how many cores I can run before getting an 'out of memory' error? I'm particularly confused at how the data gets distributed. If I run on 4 cores, the first rsession will use 30Gb of memory, while the others will be closer to 10Gb. What does that 30 Gb go towards, and is there any way of reducing it?
I apologize in advance since this post will not have any reproducible example.
I am using R x64 3.4.2 to run some cross-validated analyses on quite big matrices (number of columns ~ 80000, number of rows between 40 and 180). The analyses involve several features selection steps (performed with in-house functions or with functions from the CORElearnpackage, which is written in C++), as well as some clustering of the features and the fitting of a SVM model (by means of the package RWeka, that is written in Java).
I am working on a DELL Precision T7910 machine, with 2 processors Intel Xeon E5-2695 v3 2.30 GHz, 192 Gb RAM and Windows 7 x64 operating system.
To speed up the running time of my analysis I thought to use the doParallel package in combination with foreach. I would set up the cluster as follow
cl <- makeCluster(number_of_cores, type='PSOCK')
registerDoParallel(cl)
with number_of_clusterset to various numbers between 2 and 10 (detectCore() tells me that I have 56 cores in total).
My problem is that even if only setting number_of_cluster to 2, I got a protection from stack overflowerror message. The thing is that I monitor the RAM usage while the script is running and not even 20 Gb of my 192 Gb RAM are being used.
If I run the script in a sequential way it takes its sweet time (~ 3 hours with 42 rows and ~ 80000 columns), but it does run until the end.
I have tried (almost) every trick in the book for good memory management in R:
I am loading and removing big variables as needed in order to reduce memory usage
I am breaking down the steps with functions rather than scripting them directly, to take advantage of scoping
I am calling gc()every time I delete a big object in order to prompt R to return memory to the operating system
But I am still unable to run the script in parallel.
Do someone have any suggestion about this ? Should I just give up and wait > 3 hours every time I run the analyses ? And more generally: how is it possible to have a stack overflow problem when having a lot of free RAM ?
UPDATE
I have now tried to "pseudo-parallelize" the work using the same machine: since I am running a 10-fold cross-validation scheme, I am opening 5 different instances of Rgui and running 2 folds in each instances. Proceeding in this way, everything run smoothly, and the process indeed take 10 times less than running it in a single instance of R. What makes me wonder is that if 10 instances of Rgui can run at the same time and get the job done, this means that the machine has the computational resources needed. Hence I can not really get my head around the fact that %dopar% with 10 clusters does not work.
The "protection stack overflow" means that you have run out of the "protection stack", that is too many pointers have been PROTECTed but not (yet) UNPROTECTed. This could be because of a bug or inefficiency in the code you are running (in native code of a package or in native code of R, but not a bug in R source code).
This problem has nothing to do with the amount of available memory on the heap, so calling gc() will have no impact, and it is not important how much physical memory the machine has. Please do not call gc() explicitly at all, even if there was a problem with the heap usage, it just makes the program run slower but does not help: if there is not enough heap space but it could be obtained by garbage collection, the garbage collector will run automatically. As the problem is the protection stack, neither restructuring the R code nor removing dead variables explicitly will help. In principle, structuring the code into (relatively small) functions is a good thing for maintainability/readability and it also indirectly reduces scope of variables, so removing variables explicitly should become unnecessary.
It might help to increase the pointer protection stack size, which can be done at R startup from the command line using --max-ppsize.
I am trying to use a ff object to run a svm classification study.
I converted my dataframe to a ff object using ffdf <- as.ffdf(signalDF). The dataset has 1024 columns and ~ 600K rows.
When I run the function, svm(Y~., data=ffdf,scale=FALSE,kernel="linear"), I receive the error:
Error: cannot allocate vector of size 15.8 Gb
Running ulimit -n:
64000
Also, runnning df shows plenty of disk space.
Any reason why I am receiving a memory error when using a ff object?
Any help is appreciated.
Thank you
Disk space is different from memory available for computation. The error indicates that you don't have enough memory to perform the computation. Major reasons are that your data set is large and your computer has limited RAM. If you reduce the training size it will run.