Using rstan, I am running a code that uses 4 cores in parallel. I have access to a computer with 32 cores and I need to run 3 instances of the same code on different datasets, and another 3 instances of a slightly different code on the same datasets, for a total of 6 models. I'm having a hard time figuring what is the best way to accomplish this. Ideally, the computer would be running 4 cores on each model for a total of 24 cores running at a time.
I've used the parallel package many times before but I don't think it can handle this kind of "parallel in parallel". I am also aware of the Jobs feature in RStudio but one of the good things about rstan is that it interactively shows you how the chains progress, so ideally I would like to be able to see these updates. Can this be accomplished by having 6 different RStudio sessions open at once? I tried running two at a time but I'm not sure if they run in parallel to each other as well, so any clarification would be great.
I would suggest using batch jobs instead. In principle, since you don't have that many models, you could simply try writing 9 different R scripts and store them as, e.g., model1.R, model2.R, ..., model6.R. With that, you could then try submitting the jobs in the command line like this:
R CMD BATCH --vanilla model1.R model1.Rout &
This will run the first script in batch mode and output the stdout to a log file, model1.Rout. That way, you can inspect the state of the jobs by just opening that file. Of course, you will need to run the above command for each model.
Related
I am running arcpy (using Spyder IDE to run script) and using a for loop which undertakes a number of geoprocessing activities in each loop, as I iterate through a table of inputs. I am using a surface pro laptop (16GB memory) to run the script.
After I've looped through a few hundred times, I get the ExecuteError: ERROR 010005: Unable to allocate memory. Failed to Execute (ExtractByMask). It seems to follow an ExtractByMask step each time (see image).
When I first run the script in Spyder, I get get through about 1000 iterations with no problems. But when I try again straight after this, it terminates with the error after a few hundred more loops at most. I could restart Spyder each time and work in smaller data batches (of say 1000), but this is quite teadious if I am trying to get through about 20,000 iterations.
When i check my laptop system memory, it only appears to be at 30% usage when the arcpy program is running - so I don't think it is this. I had read a few posts which suggest it could be an environment setting associated with ArcGIS. Has anyone encountered this and found a solution to it?
Only work around so far is to quit the Spyder IDE application and reopen, running the loop for smaller batches of data.
I have a script in R that is frequently called during the day (by other scripts). I call R in a terminal using
Rscript code.R
I notice it takes a lot of time to load packages and set up R.
Is it possible to run R as a background service which I hit using a port or something?
Yes, look into RServe which has been available for over a dozen years for this reason. There are a couple of fairly high profile applications too.
You can check out this add-in for Rstudio, it is not a port like solution but maybe it can help you https://github.com/bnosac/taskscheduleR
I'm running Rstudio server and wondering if there is a way to run a command that may take a bit of time to complete and at the same time visually explore some of my environment's dataframes.
When I click on a dataframe it issues the view() command but if R is busy, it will not let me view the dataframe until the last command finishes. Is there a way to run the view command in parallel?
No.
The other thing you might be able is if you have the Pro version generate a parallel session
I want to run an R script (in Win 7) from SQL Server 2014 each time a new record is added (to perform some analysis on the data). I saw that this can be done with the xp_cmdshell command which is like running it manually from the command line.
My problems (and questions) are:
I've made out from various websites that probably the best option is to use Rscript. This would have to be used at the command line as:
C:\Program Files\R\R-3.2.3\bin\x64\Rscript "my_file_folder\my_file.r
Can I copy Rscript.exe to the folder where my script is, such that I can run my script independently, even if R is not installed? What other files do I need to copy together with Rscript.exe such that it would work independently?
My script loads some packages that contain functions that it uses. Is there a way to somehow include these in the script such that they don't have to be loaded every time (it takes about 5 sec so far and I need this script to be faster)? Or is there a way to only load these packages the first time that the script runs?
In case the overall approach I've described here is not the best one, I am open to doing it differently. Maybe there is a way to somehow package the R script together with all the required dependencies (libraries and other parts of the R software which the script would need to run independently).
What I ultimately need is a for the script to run silently, and reasonably fast, without any windows or anything else popping up, each time a new record is added to my database, do the analysis and exit.
Thanks in advance for any answers.
UPDATE:
I figured out an elegant solution to running the R script. I'm setting up a job in SQL Server and inside that job I'm using "xp_cmdshell" to run my script as a parameter to Rscript.exe, as detailed at point 1 above. I can start this job from any stored procedure and the beauty of it is that the stored procedure does not wait for the script to finish. It just triggers the job (that runs the script in a separate thread) and then it continues with its business.
But questions from points 1 and 2 still remain.
I would like to conduct an extensive tests on my new internet connection using speedtest.net
Is there some way that I can completely automate the process.
The speed test is conducted at a fixed interval of time automatically and a spanshot of the screen is then taken and stored on my system.
I found a good repository that does basically exactly what you're asking, but better (runs multiple tests, can run from the command line.)
I would recommend copying the code (python) from https://github.com/Janhouse/tespeed.
You can run this repeatedly using a cron job, and easily email the results to yourself using the crontab file.
For easy step by step instructions, I found http://www.pythonforbeginners.com/code-snippets-source-code/command-line-speedtest-net-via-tespeed/