R - Run source() in background - r

I want to execute a R script in the background from the R console.
From the console , i usually run R script as source('~/.active-rstudio-document')
I have to wait until the script is completed to go ahead with my rest of work.
Instead of this i want R to be running in the background while i can continue with my work in the console.
Also i should be somehow notified when R completes the source command.
Is this possible in R ?
This might be quite useful as we often sees jobs taking long time.
PS - i want the source script to be running in the same memory space rather than a new one. Hence solutions like fork , system etc wont work for me. I am seeing if i can run the R script as a separate thread and not a separate process.

You can use system() and Rscript to run your script as an asynchronous background process:
system("Rscript -e 'source(\"your-script.R\")'", wait=FALSE)
At the end of your script, you may save your objects with save.image() in order to load them later, and notify of its completion with cat():
...
save.image("script-output.RData")
cat("Script completed\n\n")
Hope this helps!

Related

Stop submitted lines of code from running

I'm running a long R script, which takes 2 or 3 days to finish. I accidentally run another script, which, if it works as R usually does, will go in some queue and R will run it as soon as the first script is over. I need to stop that, as it would compromise the results from the first script. Is there a visible queue or any other way to stop R from running some code?
I'm working on an interactive session in R studio, on windows 10.
Thanks a lot for any help!
Assuming you're running in console (or interactive session in R studio, that's undetermined from your question) and that what you did was sourcing a script/pasting code and while it was running pasting another chunck of code:
What is ongoing is that you pushed data into R process input stream, it's a buffered input, so it will run each line once the previous line call has ended and free the process.
There's no easy way to play with an input buffer, that's R internal input/output system and mostly it's the Operating system which have those information in cache for now.
Asking R itself is not possible as it already has this buffer to read, any new command would go after.
Last chance thing: If you can spot your another chunck of code starting in your console, you can try pressing the esc key to stop the code running.
You may try messing with the process buffers with procexp but there's a fair chance to just make your R session segfault anyway.
To avoid that in the future, use scripts and run them on the command line separately with Rscript (present in R bin directory under windows too despite the link pointing to a linux manpage).
This would create one session per script and allow to kill them independently. That said if they both write to the same place (database, a file would create an error if accessed by two process) that won't prevent data corruption.
I am guessing OP has below problem:
# my big code, running for a long time
Sys.sleep(10); print("hello 1")
# other big code I dropped in console while R was still busy with above code
print("hello 2")
If this is the case, I don't think it is possible to stop the 2nd process from running.

Using an active rsession when running a script via Rscript

I know that I can use Rscript to run a R script in the command line. Currently, I'm passing different parameters to my script, load a few packages and run a few functions. Then, I change the parameters — via a bash script — and run the same script with different parameters . This is all fine, however, I was wondering if there is a way that I can instantiate one rsession and grab it instead of going through the process of loading all my packages, etc. every time that Rscript executes my script.
library(ipc) might be of interest.
It allows to set up continuous communication between parent and child R processes by creating so called queues in the parent process.
Unfortunately you haven't specified your usecase with any code, so please see the vignette for examples.
One solution may be to use the session package. With it, you can perform the analogous load.session() and save.session() actions to the start and exit of the standard R launch and exit. The result is not 100 % equivalent, since the q() function is .Internal.

Problems executing a batch file from R

I am creating a function in R and in one step of the function I run a batch file, this batch file in turn runs a different program which creates files that I then want to read in my function.
I am using shell.exec to run the batch file and it runs fine, the problem is that the next line of my code that wants to read in the output from the program ran by the batch file crashes because it hasn't been created yet.
So I get an error the first time I call my function but if I simply call it again it runs fine. Example of code below: Basically what happens is I get an error message when calling the function saying that .../bat_output.txt does not exist, because the batch file hasn't been run yet, but then when I call the function again, it works fine.
shell.exec("run.bat")
readout<-read.table("bat_output.txt")
Any suggestions?
shell.exec returns immediately while the script is running in the background. The reason bat_output.txt is not found the first time is likely that the script has not finished yet. shell.exec does not give you the ability to wait or any information to determine if the process is still running, so it might not be the best tool for this.
Alternatives:
system("cmd /c run.bat")
system2("cmd", c("/c", "run.bat"))
Realize that if you reference a different path, you might want/need to normalizePath and/or dQuote it going into those commands. (R's system* commands are bad at argument forming.)

Can a rscript be aware of other Rscripts in ubuntu

I have a cron job that launches an Rscript to go fetch some data and stash it away in a database. Sometimes the source isn't ready so it waits. Sometimes the source skips a data point so the script ends up waiting until another instance is started from cron. Now that two instances are running they cause problems with each other. Is there something like the following pseudo code that I could put at the top of my scripts so they stop when they see another instance of themselves is already running:
stopifnot(Sys.running('nameofscript.r'))
One thing I thought to do would be for the script to make a temp file with a fixed name at the start and then delete that temp file at the end. This way the script can check for the existence of the file to know if it's already running. I also see there's something called flock which is probably a better solution than that but I'd prefer if there was a way R can do it without bash commands (I've never really learned any bash). If there isn't a pure R way to do this, can the R script call flock on itself or would it only work if the cron task calls a bash script that then calls the rscript. If it isn't already obvious, I don't really know how to use flock.

Command to open new R Session from R source file

I run some calculations over several hours with R. After a while my memory is full of junk. The gc() and rm() command don't solve the problem. What I did is that I shut down my R session and opend a new one. This solves the memory problem. Now I want to automate this process. Is there a command to open a second R session or RGui form an existing session. Then I want to set the wd in this second session, run some code there and close it after some time. How can I do this? Alternatively, is there another way to get rid of the junk in my memory.
You may want to give Rscript a try, see Rscript --help from command line. Break down your big script into smaller parts and run them in succession using the same workspace with Rscript --restore --save yourscript.r. A new session of R will be opened for each script which may help you to keep memory use under control.

Resources