Is there any way to unlimit the CPU usage so my PC puts more effort in finishing a task for rapidly? At the moment the k-means algorithm is estimated to finish in about 10 days, which is something I would like to reduce.
R is single-threaded by default, and runs only on a single thread on the CPU, which is a pity if you have a machine with 16 or 32 cores. By unlimiting the CPU usage, I have to assume you're asking if there's any way to have an R process (let's say part of the k-means algorithm) take advantage of your full CPU power by running the process in-parallel.
Many R packages and processes are not going to be helped by parallel processing though. So the technical solution to your particular problem goes down to the package implementation you're using. Popular packages like caret do support parallelization when that's possible, even though you may need to add an additional allowParallel=T parameter. They work in conjunction with a library such as doMC to allow multi-core processes. In the following sample code, I have my machine use 8 cores through the registerDoMC(8) function, and then set allowParallel=T.
library(doMC)
registerDoMC(8)
system.time({
ctrl_2 <- trainControl(method="cv", number=3, allowParallel=T)
fb_forest_2 <- train(classe ~ ., data=fb_train, method="rf", trControl = ctrl_2)
})
Again, parallel processing doesn't always help - Not all process can be parallelized! The documentation for foreach are a great read so if you can afford the time take a look at it. The specific code solution for your problem also depend on the library implementation you're using.
Related
Although I modeled on a 40-core computer, I only used a few of them when modeling with Keras. I would like to know how to call all CPU cores for computation. I tried to implement parallel computing in the following way but failed. Scholars, please enlighten me on other methods.
library(doParallel)
cl <- makeCluster(detectCores())
registerDoParallel(cl)
history <- model %>% fit(xtrain, ytrain,
epochs = 200, batch_size=100, verbose = 1)
Tensorflow/Keras takes care of parallelism in fit(), and it generally won't work if you manually try to fork the parent R process or manage a PSOCK cluster. The {parallel} package that comes with R is not compatible with Tensorflow/Keras.
If it looks like Tensorflow/Keras is not using all your CPU cores with the default settings, you can adjust the thread pool size here: https://www.tensorflow.org/api_docs/python/tf/config/threading (but, in my experience, it's more likely that you're IO-limited, or the CPU is waiting on the GPU, and probably not that the thread pool size is too small).
If you're interested in distributed computing with Tensorflow, here is a good place to get started: https://www.tensorflow.org/api_docs/python/tf/distribute
I noticed that R doesn't use all of my CPU, and I want to increase that tremendously (upwards to 100%). I don't want it to just parallelize a few functions; I want R to use more of my CPU resources. I am trying to run a pure IP set packing program using the lp() function. Currently, I run windows and I have 4 cores on my computer.
I have tried to experiment with snow, doParallel, and foreach (though I do not know what I am doing with them really).
In my code I have this...
library(foreach)
library(doParallel)
library(snowfall)
cl <- makeCluster(4)
registerDoParallel(cl)
sfInit(parallel = TRUE, cpus = 4)
#code that is taking a while to run but does not involve simulations/iterations
lp (......, all.int = TRUE)
sfStop()
R gets stuck and runs lp() for a very long time. My CPU is around 25%, but how can I increase that?
If you are trying to run 4 different LPs in parallel, here's how to do it in snowfall.
sfInit(parallel=TRUE, cpus=4)
sfSource(code.R) #if you have your function in a separate file
sfExport(list=c("variable1","variable2",
"functionname1")) #export your variables and function to cluster
results<-sfClusterApplyLB(parameters, functionname) #this starts the function on the clusters
E.g. The function in the sfClusterApply could contain your LP.
Otherwise see comments in regard to your question
Posting this as an answer because there's not enough space in a comment.
This is not an answer directly towards your question but more to the performance.
R uses slow statistical libraries by default which also can only use single core by default. Improved libraries are OPENBLAS/ATLAS. These however, can be a pain to install.
Personally I eventually got it working using this guide.
I ended up using Revolution R open(RRO) + MKL which has both improved BLAS libraries and multi-cpu support. It is an alternative R distribution which is supposed to have up to 20x the speed of regular R (I cannot confirm this, but it is alot faster).
Furthermore, you could check the CRAN HPC packages to see if there is any improved packages which support the lp function.
There is also packages to explore multi cpu usage.
This answer by Gavin, as well as #user3293236's answer above show several possibilities for packages allowing multi CPU usage.
In the help for detectCores() it says:
This is not suitable for use directly for the mc.cores argument of
mclapply nor specifying the number of cores in makeCluster. First
because it may return NA, and second because it does not give the
number of allowed cores.
However, I've seen quite a bit of sample code like the following:
library(parallel)
k <- 1000
m <- lapply(1:7, function(X) matrix(rnorm(k^2), nrow=k))
cl <- makeCluster(detectCores() - 1, type = "FORK")
test <- parLapply(cl, m, solve)
stopCluster(cl)
where detectCores() is used to specify the number of cores in makeCluster.
My use cases involve running parallel processing both on my own multicore laptop (OSX) and running it on various multicore servers (Linux). So, I wasn't sure whether there is a better way to specify the number of cores or whether perhaps that advice about not using detectCores was more for package developers where code is meant to run over a wide range of hardware and OS environments.
So in summary:
Should you use the detectCores function in R to specify the number of cores for parallel processing?
What is the distinction mean between detected and allowed cores and when is it relevant?
I think it's perfectly reasonable to use detectCores as a starting point for the number of workers/processes when calling mclapply or makeCluster. However, there are many reasons that you may want or need to start fewer workers, and even some cases where you can reasonably start more.
On some hyperthreaded machines it may not be a good idea to set mc.cores=detectCores(), for example. Or if your script is running on an HPC cluster, you shouldn't use any more resources than the job scheduler has allocated to your job. You also have to be careful in nested parallel situations, as when your code may be executed in parallel by a calling function, or you're executing a multithreaded function in parallel. In general, it's a good idea to run some preliminary benchmarks before starting a long job to determine the best number of workers. I usually monitor the benchmark with top to see if the number of processes and threads makes sense, and to verify that the memory usage is reasonable.
The advice that you quoted is particularly appropriate for package developers. It's certainly a bad idea for a package developer to always start detectCores() workers when calling mclapply or makeCluster, so it's best to leave the decision up to the end user. At least the package should allow the user to specify the number of workers to start, but arguably detectCores() isn't even a good default value. That's why the default value for mc.cores changed from detectCores() to getOptions("mc.cores", 2L) when mclapply was included in the parallel package.
I think the real point of the warning that you quoted is that R functions should not assume that they own the whole machine, or that they are the only function in your script that is using multiple cores. If you call mclapply with mc.cores=detectCores() in a package that you submit to CRAN, I expect your package will be rejected until you change it. But if you're the end user, running a parallel script on your own machine, then it's up to you to decide how many cores the script is allowed to use.
Author of the parallelly package here: The parallelly::availableCores() function acknowledges various HPC environment variables (e.g. NSLOTS, PBS_NUM_PPN, and SLURM_CPUS_PER_TASK) and system and R settings that are used to specify the number of cores available to the process, and if not specified, it'll fall back to parallel::detectCores(). As I, or others, become aware of more settings, I'll be happy to add automatic support also for those; there is an always open GitHub issue for this over at https://github.com/HenrikBengtsson/parallelly/issues/17 (there are some open requests for help).
Also, if the sysadm sets environment variable R_PARALLELLY_AVAILABLECORES_FALLBACK=1 sitewide, then parallelly::availableCores() will return 1, unless explicitly overridden by other means (by the job scheduler, by the user settings, ...). This further protects against software tools taking over all cores by default.
In other words, if you use parallelly::availableCores() rather than parallel::detectCores() you can be fairly sure that your code plays nice in multi-tenant environments (if it turns out it's not enough, please let us know in the above GitHub issue) and that any end user can still control the number of cores without you having to change your code.
EDIT 2021-07-26: availableCores() was moved from future to parallelly in October 2020. For now and for backward compatible reasons, availableCores() function is re-exported by the 'future' package.
Better in my case (I use mac) is future::availableCores() because detectCores() shows 160 which is obviously wrong.
I just tested an elastic net with and without a parallel backend. The call is:
enetGrid <- data.frame(.lambda=0,.fraction=c(.005))
ctrl <- trainControl( method="repeatedcv", repeats=5 )
enetTune <- train( x, y, method="enet", tuneGrid=enetGrid, trControl=ctrl, preProc=NULL )
I ran it without a parallel backend registered (and got the warning message from %dopar% when the train call was finished), and then again with one registered for 7 cores (of 8). The first run took 529 seconds, the second, 313. But the first took 3.3GB memory max (reported by the Sun cluster system), and the second took 22.9GB. I've got 30GB of ram, and the task only gets more complicated from here.
Questions:
1) Is this a general property of parallel computation? I thought they shared memory....
2) Is there a way around this while still using enet inside train? If doParallel is the problem, are there other architectures that I could use with %dopar%--no, right?
Because I am interested in whether this is the expected result, this is closely related but not the exact same as this question, but I'd be fine closing this and merging my question in to that one (or marking that as duplicate and pointing to this one, since this has more detail) if that's what the concensus is:
Extremely high memory consumption of new doParallel package
In multithreaded programs, threads share lots of memory. It's primarily the stack that isn't shared between threads. But, to quote Dirk Eddelbuettel, "R is, and will remain, single-threaded", so R parallel packages use processes rather than threads, and so there is much less opportunity to share memory.
However, memory is shared between the processes that are forked by mclapply (as long as the processes don't modify it, which triggers a copy of the memory region in the operating system). That is one reason that the memory footprint can be smaller when using the "multicore" API versus the "snow" API with parallel/doParallel.
In other words, using:
registerDoParallel(7)
may be much more memory efficient than using:
cl <- makeCluster(7)
registerDoParallel(cl)
since the former will cause %dopar% to use mclapply on Linux and Mac OS X, while the latter uses clusterApplyLB.
However, the "snow" API allows you to use multiple machines, and that means that your memory size increases with the number of CPUs. This is a great advantage since it can allow programs to scale. Some programs even get super-linear speedup when running in parallel on a cluster since they have access to more memory.
So to answer your second question, I'd say to use the "multicore" API with doParallel if you only have a single machine and are using Linux or Mac OS X, but use the "snow" API with multiple machines if you're using a cluster. I don't think there is any way to use shared memory packages such as Rdsm with the caret package.
There is a minimum number of characters elsewise I would simply have typed: 1) Yes. 2) No, er, maybe. There are packages that use a "shared memory" model for parallel computation, but R's more thoroughly tested packages don't use it.
http://www.stat.berkeley.edu/scf/paciorek-parallelWorkshop.pdf
http://heather.cs.ucdavis.edu/~matloff/158/PLN/ParProcBook.pdf
http://heather.cs.ucdavis.edu/Rdsm/BARUGSlides.pdf
How would I rewrite my code so that I can implement the use of multicore on an Rstudio server to run regsubsets from the leaps package using the "exhaustive" method? The data has 1200 variables and 9000 obs so the code has been shortened here:
model<-regsubsets(price~x + y + z + a + b + ...., data=sample,
nvmax=500, method=c("exhaustive"))
Our server is a quad core 7.5 gb ram, is that enough for an equation like this?
In regard to your second question I would say: just give it a try and see. In general, a 1200 times 9000 dataset is not particularly large, but wether or not it works also depends on what regsubsets does under the hood.
In general, the approach here would be to cut up the problem in pieces and run each of the piece on a core, in your case 4 cores. The paralellization works most effective if the processes that we run on a core takes some time (e.g. 10 min). If it takes very little time, the overhead of parallelization will only serve to increase the time the total analysis takes. Creating a cluster on your server is quite easy using e.g. SNOW (available on CRAN). The approach I often use is to create a cluster using the doSNOW package, and then use the functions from the plyr package. A blog post I wrote recently provides some background and example code.
In your specific case, if regsubsets does not support parallelization out of the box, you'll have to do some coding yourself. I think a variable selection method such as regsubsets requires the entire dataset to be used, therefore I think solving the parallelization by running several regsubsets in parallel is not feasible. So I think you'll have to adapt the function to include the parallelization. Somewhere inside the function the different variable selections are evaluated, there you could send those evaluations to different cores. Do take care that if such an evaluation takes only little time, the parallelization overhead will only slow your analysis down. In that case you need to send a group of evaluations to each core in order to increase the time spent on each node, decreasing the amount of overhead from parallelization.