I have a Java server from which I am calling R functions using c.eval.
I use Rserve to do this and preload all libraries using the following call
R CMD Rserver --RS-conf Rserve.conf
The file Rserve.conf has all my loaded libraries. The total memory requirement for this is around 127MB.
The challenge I have is that every time I call a function from within my Java server, a new process is spawned but it looks like the process requires the full 127MB of memory. So with 32GB RAM, roughly, 240 concurrent calls are sufficient to max out the memory usage and brings the server crashing down.
I found this link: Rserve share library code but it talks about exactly what I have been doing. Any help in understanding how to make Rserve work without it loading all libraries for every call would be much appreciated.
Related
I am using mclapply in my R script for parallel computing. It saves overall memory usage and it is fast so I want to keep it in my script. However, one thing I noticed is that the number of child processes generated during running the script is more than the number of cores I specified using mc.cores. Specifically, I am running my script on a server with 128 cores. And when I run my script, I set mc.cores to 18. During the running of the script, I checked the processes related to my script using htop. First, I can find 18 processes like this:
enter image description here
3_GA_optimization.R is my script. This all look good. But I also found more than 100 processes running at the same time with similar memory and CPU usage. The screenshot below shows some of them:
enter image description here
The problem of this is that although I only required 18 cores, the script actually uses all the 128 cores on the server and this makes the server very slow. So my first question is why is this happening? And what is the difference between these processes with green color compared to the 18 processes with black color?
My second question is that I tried to use ulimit -Su 100 to set the soft limit of maximum number of processes that I can use before running Rscript 3_GA_optimization.R. I chose 100 based on the current number of processes I am using before running the script and the number of cores I want to use when running the script. However, I got an error saying:
Error in mcfork():
unable to fork, possible reason: Resource temporarily unavailable
So it seems that mclapply has to generate a lot more processes than mc.cores in order for the script to run, which is confusing to me. So my second question is that why does mclapply behaves in this way? Is there any other way to fix the total number of cores mclapply can use?
OP followed up in a comment on 2021-05-17 and confirmed that the problem was their parallelization via mclapply() called functions of the ranger package, which in turn parallelized using all available CPU cores. This nested parallelism, cause R to use many more CPU cores than available on the machine.
R requires CPU more than anything else so it is recommended to pick one of the newer generation compute optimized instance types, preferably with a SSD disk.
I've recently run into a problem with high memory usage (quickly raising to 100%) during load testing. To reproduce: there is an R package for which processing time is UP TO 0.2 in no-stress conditions. If I'm trying to query one of the endpoints using curl for 1000 jsons on 3 machines in parallel all of the memory is suddenly used which results in 'cannot fork' or:
cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory' In call: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)
The setup is 2x AWS 8GB CPU-optimized servers + load balancer all in private network. HTTPS is enabled and my main usage is online processing of requests so I'm mostly querying /json endpoints.
Do you happen to have any suggestions on how to approach this issue? The plan is to have more packages installed (more online processes requesting result from various functions) and don't want to end up having 32GB RAM per box.
All of the packages are deployed with such options:
LazyData: false
LazyLoad: false
They are also added into serverconf.yml.j2 - preload section.
RData files are loaded within an onLoad function by calling utils::data.
Also, keeping in mind that I'm using OpenCPU without github and only one-way communication (from backend to ocpu box) which options do you suggest to turn on/optimize? It's not clearly stated in the docs yet.
It mostly depends on which packages you are using and what you are doing. Can you run the same functionality that you are invoking through opencpu locally (in the command line) without running out of memory?
Apache2 prefork creates worker processes to handle concurrent requests. Each of these workers contains an R process with all preloaded
packages. So if one request would take 500mb, the total memory
consumption on the server is n * 500 where n is the number of workers
that are loaded.
Depending on how many concurrent requests you expect, you could try
lowering StartServers or MaxRequestWorkers in your apache2 config.
Also try raising (or lowering) the option rlimit.as in the file /etc/opencpu/server.conf which limits the amount of memory (address space) a single process is allowed to consume.
I need to run thousands* of models on 15 machines (each of 4 cores), all Windows. I started to learn parallel, snow and snowfall packages and read a bunch of intro's, but they mainly focus on the setup of the master. There is only a little information on how to set up the worker (slave) nodes on Windows. The information is often contradictory: some say that SOCK cluster is practically the easiest way to go, others claim that SOCK cluster setup is complicated on Windows (sshd setup) and the best way to go is MPI.
So, what is an easiest way to install slave nodes on Windows? MPI, PVM, SOCK or NWS? My, possibly naive ideas were (listed by priority):
To use all 4 cores on the slave nodes (required).
Ideally, I need only R with some packages and a slave R script or R function that would listen on some port and wait for tasks from master.
Ideally, nodes can be added/removed dynamically from the cluster.
Ideally, the slaves would connect to the master - so I wouldn't have to list all the slaves IP's in configuration of the master.
Only 1 is 100% required, 2-4 are "would be good". Is it too naive to request?
I am sorry but I have not been able to figure this out from the available docs and tutorials. I would be grateful if you point me out to the right source.
* Note that each of those thousands of models will take at least 7 minutes, so there won't be a big communication overhead.
It's a shame how all these APIs (like parallel/snow/snowfall) are complex to work with, a lots of docs but not what you need... I have found an API which is very simple and goes straight to the ideas I sketched!! It is redis and doRedis R package (as recommended here). Finally a very simple tutorial is present! Just modified a bit and got this:
The workers need only R, doRedis package and this script:
require(doRedis)
redisWorker('jobs', '10.0.0.7') # IP of the server
The master needs redis server running (installed the experimental windows binaries for Windows), and this R code:
require(doRedis)
registerDoRedis('jobs')
foreach(j=1:10,.combine=sum,.multicombine=TRUE) %dopar%
... # whatever you need to run
removeQueue('jobs')
Adding/removing workers is fully dynamic, no need to specify IPs at master, automatic "load balanancing", simple and no need for tons of docs! This solution fulfills all the requirements and even more - as stated in ?registerDoRedis:
The doRedis parallel back end tolerates faults among the worker processes and automatically resubmits failed tasks.
I don't know how complex this would be using the parallel/snow/snowfall with SOCKS/MPI/PVM/NWS, if it would be possible at all, but I guess very complex...
The only disadvantages of using redis I found:
It is a database server. I wonder if this API exist somewhere without the need to install the database server which I don't need at all. I guess it must exist!
There is a bug in the current doRedis package ("object '.doRedisGlobals' not found") with no solution yet and I am not able to install the old working doRedis 1.0.5 package into R 3.0.1.
We have a process (written in c++ /managed), which receives network data via tcpip.
After running the process for a while while tracking network load, it seems that network get into freeze state and the process does not getting data, there are other processes in the system that using networking (same nic) which operates normally.
the process gets out of this frozen situation by itself after several minutes.
Any idea what is happening?
Any counter i can track to see if my process reach some limitations ?
It is going to be very difficult to answer specifically,
-- without knowing what exactly is your process/application about,
-- whether it is a network chat application, or a file server/client, or ......
-- without other details about your process how it is implemented, what libraries it uses, if relevant to problem.
Also you haven't mentioned what OS and environment you are running this process under,
there is very little anyone can help . It could be anything, a busy wait loopl in your code, locking problems if its a multi-threaded code,....
Nonetheless , here are some options to check:
If its linux try below commands to debug and monitor the behaviour of the process and see what could be problem-
top
Check top to see ow much resources(CPU, memory) your process is using and if there is anything abnormally high values in CPU usage for it.
pstack
This should stack frames of the process executing at time of the problem.
netstat
Run this with necessary options (tcp/udp) to check what is the stae of the network sockets opened by your process
gcore -s -c
This forces your process to core when the mentioned problem happens, and then analyze that core file using gdb
gdb
and then use command where at gdb prompt to get full back trace of the process (which functions it was executing last and previous function calls.
I want to check the memory usage of a JAR that does some calculations. For this I want to use JVM monitor. When starting JVM monitor, I need to pick the JVM that is running my jar. But the problem is that my JAR executes so fast (<1sec) that it never shows up in the list..
Is there any way I can start the JVM without executing the JAR immediatly?
JConsole finds running applications at the time when JConsole starts . Then only the currently running applications port and host will be displayed in the list. But for such very short running applications to be shown in the list adding a wait at the end of program execution is the only choice you can do .
Also whatever the memory stats Jconsole will display will include Jconsole's memory footprint as well. So the better choice for monitoring is jvisualvm which can show memory , threads and gc statistics as well .
Alternatevely if you want to check the code cache or compilation statistics you can use -XX:+LogCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintCodeCache