I am currently running into a weird issue that the foreach...dopar runs like 10x faster in my local laptop (Dell laptop, Windows 10 OS) with 15 cores than when the R code was put onto Docker container containing 8 cores. The code itself only sets the ncores parameter to be 3, so I am puzzled by why there is such a drastic difference in runtime like that. Did anyone here run into the similar issue with the doParallel package in Docker? If yes, how did you resolve it?
Related
Fresh install of windows 11 and docker. i7 and 16GB. I am developing a wordpress site. The wordpresssite is super slow when running with docker. Opening a page takes like 10 seconds.
I already tried creating a .wslconfig file with the following content
[wsl2]
memory=6000MB #Limits VM memory in WSL 2 to 6000MB
processors=4 #Makes the WSL 2 VM use four virtual processors
Same result, although vmmem now uses less memory. Super slow.
What else can I do?
Would turning WSL2 off be helpful? Shouldn't WSL2 be better performance?
If we install.packages("dplyr") on a GCP 'RStudio Server Pro Standard' VM, it takes around 3 minutes to install (on instance with 4 cores / 15 gb ram)
This seems unusual, as installation would typicaly take ~20 seconds on a laptop with equivalent specs.
Why so slow, and is there a quick and easy way of speeding this up?
Notes
I use the RStudio Server Pro Standard image from GCP marketplace to start the instance
Keen to know if there are any 'startup scripts' or similar I can set to run after the instance starts, e.g. to install a collection of commonly used packages
#user5783745 you can also adjust the Makevars to allow multithreaded compilation, which will help speed up compilations.
I followed this RStudio community post, and dropped MAKEFLAGS = -j4 into ~/.R/Makevars.
This basically halved the amount of time it took to install dplyr from scratch on the RStudio Server Pro Standard for GCP instance I spun up. (same as yours, 4 vCPU, 15GB ram)
I have downloaded the Cordapp example provided in the Corda website. I follow all the steps (to run it from the console) in
https://docs.corda.net/tutorial-cordapp.html
without any problem until "Running the example CorDapp". Here i get to errors one way or another.
First, when running
workflows-kotlin/build/nodes/runnodes
one or more of the nodes would not start. I was using a virtual machine with 2 cores and 4GB of RAM. Eventually, i noticed it seemed to be an issue with the RAM, so i changed the VM condig to 4 cpus and 10 GB of RAM.
Now, i can run
workflows-kotlin/build/nodes/runnodes
and get all 4 nodes working but, as soon as I run the following instruction
/gradlew runPartyXServer
Where X=[A,B,C] for each of the possible nodes, after 20-30 seconds as much, the machine repently slows down and aborts.
The VM has Fedora 30, 4 cores and 10GB of RAM. It is empty except for what i downloaded for the tutorial. I cannot believe those are not enough resources to run the tutorial, Am i wrong? Do i need more? may it be another thing?
Any help is welcome.
== Solved ==
The issue were the resources. I jumped to 8 cores and 32GB and it ran. I will try at some point with 16GB. In any case, the problem, from my point of view, is that having those large hardware requirements, the tutorial should include a section describing the minimum setup needed to run it.
From the given information, I believe you had ran into a Memory issue.
According to our documentation, Corda has a suggested minimal requirement of 1GB of Heap and 2-3GB of Host RAM per node.
https://docs.corda.net/docs/corda-enterprise/4.4/node/sizing-and-performance.html#sizing
I would suggest either reduce the number of nodes hosted on a single machine or expand your RAM size of the VM
I've got a Windows HPC Server running with some nodes in the backend. I would like to run Parallel R using multiple nodes from the backend. I think Parallel R might be using SNOW on Windows, but not too sure about it. My question is, do I need to install R also on the backend nodes?
Say I want to use two nodes, 32 cores per node:
cl <- makeCluster(c(rep("COMP01",32),rep("COMP02",32)),type="SOCK")
Right now, it just hangs.
What else do I need to do? Do the backend nodes need some kind of sshd running to be able to communicate each other?
Setting up snow on a Windows cluster is rather difficult. Each of the machines needs to have R and snow installed, but that's the easy part. To start a SOCK cluster, you would need an sshd daemon running on each of the worker machines, but you can still run into troubles, so I wouldn't recommend it unless you're good at debugging and Windows system administration.
I think your best option on a Windows cluster is to use MPI. I don't have any experience with MPI on Windows myself, but I've heard of people having success with the MPICH and DeinoMPI MPI distributions for Windows. Once MPI is installed on your cluster, you also need to install the Rmpi package from source on each of your worker machines. You would then create the cluster object using the makeMPIcluster function. It's a lot of work, but I think it's more likely to eventually work than trying to use a SOCK cluster due to the problems with ssh/sshd on Windows.
If you're desperate to run a parallel job once or twice on a Windows cluster, you could try using manual mode. It allows you to create a SOCK cluster without ssh:
workers <- c(rep("COMP01",32), rep("COMP02",32))
cl <- makeSOCKluster(workers, manual=TRUE)
The makeSOCKcluster function will prompt you to start each one of the workers, displaying the command to use for each. You have to manually open a command window on the specified machine and execute the specified command. It can be extremely tedious, particularly with many workers, but at least it's not complicated or tricky. It can also be very useful for debugging in combination with the outfile='' option.
I am running an external program via R that is pretty memory hungry and can take >8 hours to run. I'd like to open up another instance of R to do other tasks but am concerned about crashing the external program and having to restart the process. Should I expect any problems under these circumstances? The external program is widows only and I'm running it on a Bootcamp partition on a MacBook Pro.
On a proper operating system, both instances will be independent and not interfere with each other. (Unless they compete for the same resources, from that does not seem to be the case from your description.)
This is no different than several users running on server and each running one or two instances...