R script while windows system is locked - r

I need to know what will happen if I am running an R script which takes typically say 30 to 45 minutes to execute and during that time my windows 7 system gets locked. Will it effect the R script execution in any way? or will I have the complete and accurate R script execution if I unlock my system after say 2 or 3 hours or just after the R script execution time.

Related

Automated R script through Task Scheduler not running complete script

I have an R script that I've set up to automatically update every week using Task Scheduler. The R script is supposed to pull out about 10 extracts from Google Analytics, however when I run the script through task scheduler it only extracts 4 and then seemingly stops. If I run it manually in RStudio it runs all the way through with no errors. Does anyone know why it might stop in Task Scheduler?

Parallel instances each running parallelized code?

Using rstan, I am running a code that uses 4 cores in parallel. I have access to a computer with 32 cores and I need to run 3 instances of the same code on different datasets, and another 3 instances of a slightly different code on the same datasets, for a total of 6 models. I'm having a hard time figuring what is the best way to accomplish this. Ideally, the computer would be running 4 cores on each model for a total of 24 cores running at a time.
I've used the parallel package many times before but I don't think it can handle this kind of "parallel in parallel". I am also aware of the Jobs feature in RStudio but one of the good things about rstan is that it interactively shows you how the chains progress, so ideally I would like to be able to see these updates. Can this be accomplished by having 6 different RStudio sessions open at once? I tried running two at a time but I'm not sure if they run in parallel to each other as well, so any clarification would be great.
I would suggest using batch jobs instead. In principle, since you don't have that many models, you could simply try writing 9 different R scripts and store them as, e.g., model1.R, model2.R, ..., model6.R. With that, you could then try submitting the jobs in the command line like this:
R CMD BATCH --vanilla model1.R model1.Rout &
This will run the first script in batch mode and output the stdout to a log file, model1.Rout. That way, you can inspect the state of the jobs by just opening that file. Of course, you will need to run the above command for each model.

Stop submitted lines of code from running

I'm running a long R script, which takes 2 or 3 days to finish. I accidentally run another script, which, if it works as R usually does, will go in some queue and R will run it as soon as the first script is over. I need to stop that, as it would compromise the results from the first script. Is there a visible queue or any other way to stop R from running some code?
I'm working on an interactive session in R studio, on windows 10.
Thanks a lot for any help!
Assuming you're running in console (or interactive session in R studio, that's undetermined from your question) and that what you did was sourcing a script/pasting code and while it was running pasting another chunck of code:
What is ongoing is that you pushed data into R process input stream, it's a buffered input, so it will run each line once the previous line call has ended and free the process.
There's no easy way to play with an input buffer, that's R internal input/output system and mostly it's the Operating system which have those information in cache for now.
Asking R itself is not possible as it already has this buffer to read, any new command would go after.
Last chance thing: If you can spot your another chunck of code starting in your console, you can try pressing the esc key to stop the code running.
You may try messing with the process buffers with procexp but there's a fair chance to just make your R session segfault anyway.
To avoid that in the future, use scripts and run them on the command line separately with Rscript (present in R bin directory under windows too despite the link pointing to a linux manpage).
This would create one session per script and allow to kill them independently. That said if they both write to the same place (database, a file would create an error if accessed by two process) that won't prevent data corruption.
I am guessing OP has below problem:
# my big code, running for a long time
Sys.sleep(10); print("hello 1")
# other big code I dropped in console while R was still busy with above code
print("hello 2")
If this is the case, I don't think it is possible to stop the 2nd process from running.

How to run R script from command line repeatedly but only load packages the first time

I want to run an R script (in Win 7) from SQL Server 2014 each time a new record is added (to perform some analysis on the data). I saw that this can be done with the xp_cmdshell command which is like running it manually from the command line.
My problems (and questions) are:
I've made out from various websites that probably the best option is to use Rscript. This would have to be used at the command line as:
C:\Program Files\R\R-3.2.3\bin\x64\Rscript "my_file_folder\my_file.r
Can I copy Rscript.exe to the folder where my script is, such that I can run my script independently, even if R is not installed? What other files do I need to copy together with Rscript.exe such that it would work independently?
My script loads some packages that contain functions that it uses. Is there a way to somehow include these in the script such that they don't have to be loaded every time (it takes about 5 sec so far and I need this script to be faster)? Or is there a way to only load these packages the first time that the script runs?
In case the overall approach I've described here is not the best one, I am open to doing it differently. Maybe there is a way to somehow package the R script together with all the required dependencies (libraries and other parts of the R software which the script would need to run independently).
What I ultimately need is a for the script to run silently, and reasonably fast, without any windows or anything else popping up, each time a new record is added to my database, do the analysis and exit.
Thanks in advance for any answers.
UPDATE:
I figured out an elegant solution to running the R script. I'm setting up a job in SQL Server and inside that job I'm using "xp_cmdshell" to run my script as a parameter to Rscript.exe, as detailed at point 1 above. I can start this job from any stored procedure and the beauty of it is that the stored procedure does not wait for the script to finish. It just triggers the job (that runs the script in a separate thread) and then it continues with its business.
But questions from points 1 and 2 still remain.

Stop R package build & reload from backing up and resuming R session

I am writing an R package in Rstudio on Windows 10. Every time I reload the package, a note comes up: "Backing up R session" and then "Resuming R session". This takes a bit of time (about 8 or 9 seconds out of a total package build time of 14 seconds), and it would be nice if it could go a bit faster. When I reload the package, I am most of the time fine with not backing up the R session and just starting with a clean session.
Is there any way I can stop R from backing up the session or resuming the old session? It still seems to take some time to complete the process even if I run rm(list=ls()) before clicking "Build & Reload".
FWIW, that's not a good solution for me. My problem is that if I forget to erase a large object in memory (an output of my code) before rebuilding the package, that object is backed up, taking several minutes, before my R session finally comes back to life.
https://github.com/rstudio/rstudio/issues/7287

Resources